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Scientific  Progress  and  Accomplishments: 

This  project  addresses  the  problem  of  how  to  produce  reliable  software 
that  is  also  flexible  and  cost  effective  for  the  DoD  distributed 
software  domain.  Current  and  future  DoD  software  systems  fall  into  two 
categories:  information  systems  and  warfighter  systems.  Both  kinds  of 
systems  can  be  distributed,  heterogeneous  and  network-based,  consisting 
of  a  set  of  components  running  on  different  platforms  and  working 
together  via  multiple  communication  links  and  protocols. 

We  focused  on  "wrap  and  glue"  technology  based  on  a  domain  specific 
fuS^riKUt^d  prctotyPe  model.  Glue  and  wrappers  consists  of  software 
a  n  ges  the  interoperability  gap  between  individual  COTS/GOTS 
components.  The  key  to  making  the  proposed  approach  reliable,  flexible 
and  cost  effective  is  the  automatic  generation  of  glue  and  wrappers 
based  on  a  designer's  specification.  The  proposed  "wrap  and  glue" 
approach  allows  system  designers  to  concentrate  on  the  difficult 
interoperability  problems  and  defines  solutions  in  terms  of  deeper  and 
more  difficult  interoperability  issues,  while  freeing  designers  from 
implementation  details.  The  objective  of  our  research  is  to  develop  an 
integrated  set  of  formal  models  and  methods  for  system  engineering 
automation.  These  results  will  enable  building  decision  support  tools 
for  concurrent  engineering.  Our  research  addresses  complex  modular 
systems  with  embedded  control  software  and  real-time  requirements. 

°Ur.,  l0hg~terrfl  goals  are  to  construct  an  integrated  set  of  software 

t00ls,that  can  imProve  software  quality  and  flexibility  by  automating  a 
significant  part  of  the  process  and  providing  substantial  decision 

support  for  the  aspects  that  cannot  be  automated.  The  resulting 
development  environment  should  be  adaptable  to  enable  (1)  maintaining 
integrated  support  in  the  presence  of  business  process  improvement,  (2) 
incorporation  of  future  improvements  in  engineering  automation  methods, 
and  (3)  specialization  to  particular  problem  domains. 

Specific  tasks  accomplished  in  FYOO  include  (1)  the  design  of  an 
interface  wrapper  model  that  allows  developers  to  treat  distributed 
objects  as  local  objects,  (2)  the  development  of  a  tool  to  generate 

Java  interface  wrappers  from  a  specification  written  in  the  high-level 
Prototype  System  Description  Language  (PSDL) ,  (3)  the  design  of  a 
distributed  heterogeneous  environment  to  automate  the  process  of 

integration  distributed  systems,  (4)  a  case  study  involving  the 

a  ”wrapper  and  glue"  solution  for  integrating/extending 
COTS/GOTS /legacy  components  of  the  Naval  Integrated  Tactical 

Environmental  System  I  (NITES  I),  (5)  the  design  of  high-level  net 

o  e  s  or  fault  detection  in  multistage  interconnected  networks,  (6) 
toois  for  assertion  checking,  dynamic  analysis  and  testing  of  programs, 

\  ?DfllCft;L,°n-,  °f  raachine  learning  algorithms  in  software  development, 
and  (8)  reliability  modeling  for  safety  critical  software. 
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orkshop  on  Algorithmic  and  Automatic  Debugging,  AADEBUG' 2000,  Munich 
Germany,  August  28-30,  2000.  ' 
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ABSTRACT 

This  paper  suggests  an  approach  to  the  development  of 
software  testing  and  debugging  automation  tools  based  on 
precise  program  behavior  models.  The  program  behavior 
model  is  defined  as  a  set  of  events  (event  trace)  with  two  basic 
binary  relations  over  events  -  precedence  and  inclusion,  and 
represents  the  temporal  relationship  between  actions.  A 
language  for  the  computations  over  event  traces  is  developed 
that  provides  a  basis  for  assertion  checking,  debugging 
queries,  execution  profiles,  and  performance  measurements. 

The  approach  is  nondestructive,  since  assertion  texts  are 
separated  from  the  target  program  source  code  and  can  be 
maintained  independently.  Assertions  can  capture  the 
dynamic  properties  of  a  particular  target  program  and  can 
formalize  the  general  knowledge  of  typical  bugs  and 
debugging  strategies.  An  event  grammar  provides  a  sound 
basis  for  assertion  language  implementation  via  target 
program  automatic  instrumentation. 

An  implementation  architecture  and  preliminary 
experiments  with  a  prototype  assertion  checker  for  the  C 
programming  language  are  discussed. 

Keywords 

Program  behavior  models,  events,  event  grammars, 
software  testing  and  debugging  automation. 

1  INTRODUCTION 

Program  testing  and  debugging  is  still  a  human  activity 
performed  largely  without  any  adequate  tools,  and  consum¬ 
ing  more  than  50%  of  the  total  program  development  time 
and  effort  [9].  Testing  and  debugging  are  mostly  concerned 
with  the  program  run-time  behavior,  and  developing  a  pre¬ 
cise  model  of  program  behavior  becomes  the  first  step 
towards  any  dynamic  analysis  automation.  In  building  such 
a  model  several  considerations  were  taken  in  account.  The 
first  assumption  we  make  is  that  the  model  is  discrete,  i.e. 
comprises  a  finite  number  of  well-separated  elements.  For 
this  reason  the  notion  of  event  as  an  elementaiy  unit  of 
action  is  an  appropriate  basis  for  building  the  whole  model. 
The  event  is  an  abstraction  for  any  detectable  action  per¬ 


formed  during  the  program  execution,  such  as  a  statement 
execution,  expression  evaluation,  procedure  call,  sending 
and  receiving  a  message,  etc. 

Actions  (or  events)  are  evolving  in  time  and  the  program 
behavior  represents  the  temporal  relationship  between 
actions.  This  implies  the  necessity  to  introduce  an  ordering 
relation  for  events.  Semantics  of  parallel  programming 
languages  and  even  some  sequential  languages  (such  as  C)  do 
not  require  the  total  ordering  of  actions,  so  partial  event 
ordering  is  the  most  adequate  method  for  this  purpose  [21]. 

Actions  performed  during  the  program  execution  are  at 
different  levels  of  granularity,  some  of  them  include  other 
actions,  e.g.  a  subroutine  call  event  contains  statement  exe¬ 
cution  events.  This  consideration  brings  to  our  model  inclu¬ 
sion  relation .  Under  this  relationship,  events  can  be 
hierarchical  objects  and  it  becomes  possible  to  consider  pro¬ 
gram  behavior  at  appropriate  levels  of  granularity. 

Finally,  the  program  execution  can  be  modeled  as  a  set  of 
events  (event  trace)  with  two  basic  relations:  partial  ordering 
and  inclusion.  In  order  to  specify  meaningful  program 
behavior  properties  we  have  to  enrich  events  with  some 
attributes. 

An  event  may  have  a  type  and  some  other  attributes,  such 
as  event  duration,  program  source  code  related  to  the  event, 
program  state  associated  with  the  event  (i.e.  program  variable 
values  at  the  beginning  and  at  the  end  of  the  event),  etc.  This 
program  behavior  model  may  be  regarded  as  a  “lightweight” 
semantics  of  the  programming  language. 

The  next  problem  to  be  addressed  after  the  program 
behavior  model  is  set  up  is  the  formalism  for  specifying 
properties  of  the  program  behavior.  This  could  be  done  in 
many  different  ways,  e.g.,  by  adopting  some  kind  of  logic 
calculi  (predicate  logic,  temporal  logic).  Such  a  direction 
leads  to  tools  for  static  program  verification,  or  in  more 
pragmatic  incarnations  to  an  approach  called  model  checking 
[12]. 

Since  our  goal  is  dynamic  program  analysis  that  requires 
different  types  of  assertion  checking,  debugging  queries, 
program  execution  profiles,  and  so  on,  we  developed  the 
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concept  of  a  computation  over  the  event  trace.  It  seems  that 
this  concept  is  general  enough  to  cover  all  the  above 
mentioned  needs  in  the  unifying  framework,  and  provides 
sufficient  flexibility.  This  approach  implies  the  design  of  a 
special  programming  language  for  computations  over  the 
event  traces.  We  suggest  a  particular  language  called 
FORMAN  ([3],  [17])  based  on  a  functional  paradigm  and  the 
use  of  event  patterns  and  aggregate  operations  over  events. 
The  papers  [2],  [3],  [17]  are  based  on  our  assertion  checker 
prototype  for  a  subset  of  the  PASCAL  language.  This  paper 
describes  the  first  experience  with  an  assertion  checker  for  the 
C  programming  language.  The  implementation  of  the  C 
assertion  checker  is  based  on  source  code  automatic 
instrumentation  and  supports  almost  complete  C  language 
(the  most  serious  constraint  is  the  requirement  that  the  target 
program  is  contained  in  a  single  compilation  unit).  To  adjust 
to  the  specifics  of  the  C  target  language  the  FORMAN 
language  has  been  modified,  in  particular,  the  scope  construct 
(WITHIN  function-name)  and  explicit  type  cast  have  been 
added  (see  examples  in  Sec.  4). 

Patterns  describe  the  structure  of  events  with  context 
conditions.  Program  paths  can  be  described  by  path 
expressions  over  events.  All  this  makes  it  possible  to  write 
assertions  not  only  about  variable  values  at  program  points 
but  also  about  data  flow  and  control  flow  in  the  target 
program.  Assertions  can  also  be  used  as  conditions  in  rules 
which  describe  debugging  actions.  For  example,  an  error 
message  is  a  typical  action  for  a  debugger  or  consistency 
checker.  Thus,  it  is  also  possible  to  specify  debugging 
strategies. 

The  notions  of  event  and  event  type  are  powerful 
abstractions  which  make  it  possible  to  write  assertions 
independent  of  a  particular  target  program.  Such  generic 
assertions  can  be  collected  in  standard  libraries  which 
represent  general  knowledge  about  typical  bugs  and 
debugging  strategies  and  could  be  designed  and  distributed  as 
special  software  tools. 

Possible  applications  of  a  language  for  computations  over 
a  program  event  trace  include  program  testing  and  debugging, 
performance  measurement  and  modeling,  program  profiling, 
program  animation,  program  maintenance  and  program 
documentation  [5].  Even  the  traditional  debugging  method 
based  on  scattering  print  statements  across  the  source  code 
may  be  easily  implemented  as  an  appropriate  computation  on 
the  event  trace  (see  example  in  Sec  4).  The  advantage  is  that 
the  print  statements  are  kept  in  a  separate  file  and  the  source 
code  of  the  target  program  will  be  instrumented  automatically 
just  before  execution.  A  study  of  applying  FORMAN  to 
parallel  programming  is  presented  in  [4]. 

2  EVENTS 

FORMAN  is  based  on  a  semantic  model  of  target  program 
behavior  in  which  the  program  execution  is  represented  by  a 


set  of  events.  An  event  occurs  when  some  action  is  performed 
during  the  program  execution  process.  For  instance,  a 
function  is  called,  a  statement  is  executed,  or  some  expression 
is  evaluated.  A  particular  action  may  be  performed  many 
times,  but  every  execution  of  an  action  is  denoted  by  a  unique 
event. 

Every  event  defines  a  time  interval  which  has  a  beginning 
and  an  end.  For  atomic  events,  the  beginning  and  end  points 
of  the  time  interval  will  be  the  same.  All  events  used  for 
assertion  checking  and  other  computations  over  event  traces 
must  be  detectable  by  some  implementation  (e.g.  by  an 
appropriate  target  program  instrumentation.)  Attributes 
attached  to  events  bring  additional  information  about  event 
context,  such  as  current  variable  and  expression  values. 

In  order  to  give  some  rationale  for  our  notion  of  an  event, 
let  us  consider  a  well-known  idea  such  as  a  counter.  Usually 
the  history  of  a  variable  X  when  used  as  a  counter  looks  like: 

X  :=  0; ... 

Loop  ... 

X:=X+  1;... 

endloop; ... 

In  order  to  determine  whether  the  actual  behavior  of  the 
counter  X  matches  the  pattern  described  by  the  program 
fragment  above  we  have  to  consider  the  following  events.  Let 
Initialize_X  denotes  the  event  of  assigning  0  to  the  variable 
X,  Augment_X  denotes  the  event  of  incrementing  X,  and 
Assign_X  denotes  the  event  of  assigning  any  value  to  the 
variable  X.  The  event  Assign_X  is  a  composite  one;  it 
contains  either  Initialize_X  or  Augment_X  events.  One  could 
determine  if  X  behaves  as  a  counter  when  a  program  segment 
S  is  executed  in  the  following  way.  First,  the  sequence  A  of 
all  events  of  the  type  Assign_X  from  the  event  trace  of 
program  segment  S  has  to  be  extracted  preserving  the 
ordering  between  events.  Second,  A  has  to  be  matched  with 
the  pattern: 

Initialize^  (Augment__X)  * 

where  ’*’  denotes  repetition  zero  or  more  times.  If  the 
actual  sequence  of  events  does  not  match  this  pattern  we  can 
report  an  error.  Therefore,  assertion  checking  can  be 
represented  as  a  kind  of  computation  over  a  target  program 
event  trace. 

The  program  state  (current  values  of  variables)  can  be 
considered  at  the  beginning  or  at  the  end  of  an  appropriate 
event.  This  provides  the  opportunity  to  write  assertions  about 
program  variable  values  at  different  points  in  the  program 
execution  history. 

Program  profiling  usually  is  based  on  counting  the  number 
of  events  of  some  type,  e.g.  the  number  of  statement 
executions  or  procedure  calls.  Performance  measurements 
may  be  based  on  attaching  the  duration  attribute  to  such 
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events  and  summarizing  durations  of  selected  events. 

3  PROGRAM  BEHAVIOR  MODEL 

FORMAN  is  intended  to  be  used  to  specify  behavior  of 
programs  written  in  some  high-level  programming  language 
which  is  called  the  target  language.  The  model  of  target 
program  behavior  is  formally  defined  as  a  set  of  events  ( event 
trace)  with  two  basic  relations,  which  may  or  may  not  hold 
between  two  arbitrary  events.  The  events  may  be  sequentially 
ordered  (PRECEDES),  or  one  of  them  might  be  included  in 
another  composite  event  (IN).  For  each  pair  of  events  in  the 
event  trace  no  more  than  one  of  these  relations  can  be 
established. 

In  order  to  define  the  behavior  model  for  a  particular  target 
language,  types  of  events  are  introduced.  Each  event  belongs 
to  one  or  more  of  predefined  event  types,  which  are  induced 
by  target  language  abstract  syntax  (e.g.  execute-statement, 
send-message,  receive-message)  or  by  target  language 
semantics  (e.g.,  rendezvous,  wait,  put-message-in-queue). 

The  target  program  execution  model  is  defined  by  an  event 
grammar.  The  event  may  be  a  compound  object,  in  which 
case  the  grammar  describes  how  the  event  is  split  into  other 
event  sequences  or  sets.  The  event  grammar  is  a  set  of  axioms 
that  describe  possible  patterns  of  basic  relations  between 
events  of  different  types  in  the  program  execution  history;  it 
is  not  intended  to  be  used  for  parsing  an  actual  event  trace. 

The  rule  A  :  :  B  C  establishes  that  if  an  event  a  of  the 
type  A  occurs  in  the  trace  of  a  program,  it  is  necessary  that 
events  b  and  c  of  types  B  and  C  also  exist,  such  that  the 
relations  b  IN  a ,  c  IN  a ,  b  PRECEDES  c  hold. 

For  the  C  language  assertion  checker  prototype  we  have 
defined  the  following  simple  event  grammar. 

(Axiom  1 )  execute_program: : 

(  ex__stmt  |  evaLexpr  )* 

(Axiom  2)  ex_stmt:: 

(  ex__stmt  |  evaLexpr  )* 

(Axiom  3)  eval_expr::  fimc_call  | 

eval_expr+  destination?  | 

{  evaLexpr  }  + 

(Axiom  4)  func_call : : 

{  evaLexpr  }*  ex__stmt* 

Axiom  1  states  that  the  program  execution  event  contains 
(the  IN  relation)  a  set  of  zero  or  more  ordered  (w.r.t.  relation 
PRECEDES)  events  of  the  types  execute-statement  or 
evaluate -express  ion. 

Axiom  2  states  the  same  fact  about  the  execute_statement 
event.  For  example,  the  event  of  executing  a  composite 
statement  such  as  if-then-else  will  contain  an  event 


eval_expr  for  condition  evaluation  and  a  sequence  of  zero 
or  more  events  for  the  corresponding  THEN  or  ELSE  branch 
execution.  If  a  statement  has  a  label  attached,  the  label 
traversal  itself  is  considered  as  an  empty  statement  execution 
event. 

Axiom  3  describes  the  possible  structure  of  an  expression 
evaluation  event:  it  may  contain  a  function  call  event  or  may 
be  an  ordered  sequence  of  other  expression  evaluation  events 
(e.g.  for  a  comma”  expression).  The  assignment  expression 
evaluation  contains  the  event  destination  which  is 
distinguished  because  it  is  of  a  special  importance  for 
assertion  checking.  In  our  model  we  have  avoided  any 
assumptions  about  the  ordering  of  argument  evaluation  for 
binary  operations,  such  as  ‘+’  or  **’,  since  the  C  language 
semantics  leaves  this  undefined  [18].  The  metaexpression 
{ eval_expr }  +  denotes  a  set  of  one  or  more  events  of  the 
type  eval_expr  without  any  ordering  relationship. 

Axiom  4  describes  the  structure  of  a  function  call  event 
which  starts  with  a  set  (may  be  empty)  of  unordered  events 
for  actual  parameter  evaluation  followed  by  the  function  body 
execution  events. 

The  order  of  event  occurrences  reflects  the  semantics  of 
the  target  language.  When  performing  an  assignment 
statement,  first  the  right-hand  part  is  evaluated  and  after  this 
the  destination  event  occurs  (which  denotes  the  assignment 
event  itself).  The  event  grammar  makes  FORMAN  suitable 
for  automatic  source  code  instrumentation  to  detect  all 
necessary  events. 

An  event  has  attributes,  such  as  the  source  text  fragment 
from  the  corresponding  target  program,  current  values  of 
target  program  variables  and  expressions  at  the  beginning  and 
at  the  end  of  event,  the  duration  of  the  event,  a  previous  path 
(i.e.  set  of  events  preceding  the  event  in  the  target  program 
execution  history),  etc. 

FORMAN  supplies  a  means  for  writing  assertions  about 
events  and  event  sequences  and  sets.  These  include 
quantifiers  and  other  aggregate  operations  over  events,  e.g., 
sequence,  bag  and  set  constructors,  boolean  operations  and 
operations  of  the  target  language  to  write  assertions  about 
target  program  variables. 

Events  can  be  described  by  patterns  which  capture  the 
structure  of  event  and  context  conditions.  Program  paths  can 
be  described  by  regular  path  expressions  over  events. 

4  EXAMPLES  OF  DEBUGGING  RULES 

In  general,  a  debugging  rule  performs  some  actions  that 
may  include  computations  over  the  target  program  event 
trace.  The  aim  is  to  generate  informative  messages  and  to 
provide  the  user  with  some  values  obtained  from  the  trace  in 
order  to  detect  and  localize  bugs.  Rules  can  provide  dialog  to 
the  user  as  well.  An  assertion  is  a  boolean  expression  that  may 
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contain  quantifiers  and  sequencing  constraints  over  events. 

Assertions  can  be  used  as  conditions  in  the  rules 
describing  actions  that  can  be  performed  if  an  assertion  is 
satisfied  or  violated.  A  debugging  rule  has  the  form: 

assertion  SAY  (expression  sequence) 

ONFAIL  SAY  (expression  sequence) 

The  presence  of  metavariables  in  the  assertion  makes  it 
possible  to  use  FORMAN  as  a  debugger’s  query  language. 
The  evaluation  of  an  assertion  is  interrupted  when  it  becomes 
clear  that  the  final  value  will  be  False  (or  True),  and  the 
current  values  of  metavariables  can  be  used  to  generate 
readable  and  informative  messages. 

We  will  use  as  an  example  of  a  C  program  the  Simple 
Tokenizer  program  described  in  [25].  This  program  reads  a 
text  file  until  the  special  symbol  V  (dot)  is  read,  recognizes 
small  integers,  identifiers,  and  some  predefined  key  words, 
skips  spaces  and  PASCAL-like  comments,  prints  the  input 
text  with  line  numbers  attached  before  each  line,  splits  the 
output  into  pages  with  a  page  header  on  the  top  of  each  page 
(including  page  number),  and  reports  each  token  recognized. 
Unrecognized  symbols  are  printed  as  ERROR  tokens.  The 
source  code  contains  542  lines  of  code  (including  some  of  our 
updates  and  comments).  The  following  list  of  function 
prototypes  used  in  the  Simple  Tokenizer  gives  some  idea  of 
the  architecture. 

void  init_scanner (char  *name) ; 
void  ini t_page_header (char  *name) ; 

BOOLEAN  get_source_line ( ) ; 
void  get_char ( ) ; 
void  skip_blanks 0 ; 
void  skip_comment ( ) ; 
void  get_token(); 
void  get_word ( ) ; 

BOOLEAN  is_reserved_word ( ) ; 

void  get_number ( ) ; 

void  get_special () ; 

void  open_source_f ile (char  *name) ; 

void  close_source_f ile ( ) ; 

void  print_line  (char  line  []  )  ; 

void  print_token ( ) ; 

void  print_page_header ( ) ; 

void  quit_scanner ( ) ; 

The  input  text  file  for  Simple  Tokenizer  used  for  running 
the  following  examples  contained  150  lines  of  text  with  a 
total  of  454  tokens.  The  corresponding  output  contained  13 
pages  with  maximum  of  50  lines  per  page  (including  the  input 
lines  and  messages  about  tokens  recognized,  each  on  a 


separate  line  of  output). 


Example  of  a  debugging  query. 

In  order  to  obtain  the  history  of  a  global  variable 
page_number  the  following  computation  over  the  event 
trace  can  be  performed.  The  WITHIN  construct  indicates  the 
scope  of  the  trace  computations  defined  by  this  rule.  The  rule 
condition  is  TRUE,  and  as  a  side  effect  the  entire  history  of 
variable  page__number  is  shown.  The  [  ...  ]  list 

constructor  defines  a  loop  over  the  entire  program  event  trace 
(execute_program  event).  All  events  matching  the 
pattern  func__call  IS  printf  (i.e.  events  of  the  type 
func_call  and  function  name  ‘printf’)  executed  within  the 
body  of  print_page_header  function  are  selected  from 
the  trace  and  the  function  VALUE  is  applied  to  them.  The 
metavariable  C  holds  the  event  func__call  under 
consideration.  The  resulting  sequence  consists  of  variable 
P a9 e_nutnb e  r  values  at  the  end  of  each  event  captured  by 
metavariable  C  during  the  program  execution. 

WITHIN  print_page_header 
TRUE 

SAY  (  'The  history  of  page_number  variable 
values  is :  ' 

[  C:  func_call  IS  'printf' 

FROM  execute__program 

APPLY  VALUE (int)  (AT  C  page_number)  ]  )  ; 

END 

When  executed  on  our  prototype  the  following  output  is 
produced: 

The  history  of  page_number  variable  values 
is:  1  2  3  4  5  6  7  8  9  10  11  12  13 

This  debugging  rule  provides  a  slice  of  the  program 
execution  history  containing  the  trace  of  particular  variable 
values.  The  matter  of  interest  may  be,  for  instance,  to  check 
whether  the  values  in  the  variable  history  are  arranged  in 
ascending  order. 


Example  of  an  assertion  checking. 

Let  us  write  and  check  the  assertion:  “There  exists  an  input 
line  with  length  exceeding  some  maximum ,  say  10T  The 
program  snippet  containing  the  function  get_source_line 
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looks  like: 

BOOLEAN  get_source_line ( ) 

{char 

print_buf f er [MAX_SOURCE_LINE_LENGTH+  9]  ; 
if ( (fgets (source_buf fer, 

MAX_SOURCE_LINE__LENGTH , 
source_file) )  !=  NULL)  { 
++line_number ; 

Get_Line : 

sprint f  (print__buf  f  er ,  "%4d  %d:  %s", 
line_number , level , source_buf f er) ; 
Printline  (print_buf f er)  ; 
return (TRUE) ; 

} 

else  return (FALSE) ;  } 

Traversal  of  a  label  is  an  event  of  the  type  ex_s  tmt ,  and 
we  can  check  the  value  of  a  C  expression 
strlen  ( source  Jbuf  fer)  >  10  after  this  event. 

WITHIN  get_source_line 

EXISTS  L:  ex__stmt  IS  'Get__Line:7 

FROM  execute_program 

VALUE (int) (AT  L  strlen (source_buf fer )  >10) 
SAY ( ' Too  long  input  line  detected  at  stmt'  ) 
SAY (L) 

SAY (  'It  is  ' 

VALUE (int)  (AT  L  strlen (source_buf f er)  ) 
'characters  long') 

ONFAIL  SAY ( '  No  long  input  lines  detected7); 

We  check  whether  the  expression 
strlen  (source_buf  f  er)  >  1 0  is  not  equal  to  0  for  all 
events  L.  When  the  assertion  is  satisfied  for  the  first  time,  the 
assertion  evaluation  terminates  and  the  current  value  of  the 
metavariable  L  can  be  used  for  message  output.  In  order  to 
make  error  messages  more  informative,  the  value  of  a 
metavariable  when  printed  by  the  SAY  clause  is  shown  in  the 
form: 

event-type :>  event -source -text 
source_line_number  within  function_name 
Time=  event -begin- time  . .  event -end -time 

Event  begin  and  end  times  in  this  prototype 
implementation  are  simply  values  of  the  step  counter. 

When  executed  on  our  prototype  this  assertion  checking 


yields  the  following  output. 

Too  long  input  line  detected  at  stmt 

ex_stmt  :  >  'Get_Line:7  source  line  460 
within  function  get_source  line 

Time=  95  . .  96 

It  is  20  characters  long 

Example  of  a  run  time  statistics  gathering. 

It  is  hard  to  measure  real  execution  time  of  a  heavily 
instrumented  target  program,  although  the  simulated  time 
measurement  may  be  performed  given  that  events  may  have 
some  duration  attributes  predefined.  In  order  to  obtain  the 
actual  number  of  function  calls  executed,  number  of  function 
get_source_line  calls,  and  number  of  tokens 
recognized  by  the  Simple  Tokenizer,  the  following  query  can 
be  performed: 

TRUE 

SAY ('Total  function  calls7 
CARD [  ALL  func_call 

FROM  execute_j?rogram]  ) 

SAY( 'Total  function  get_source_line  calls7 
CARD  [  func_call  IS  get_source  line 
FROM  executeprogram] ) 

SAY ('Total  tokens  recognized7 

CARD  [  ALL  func_call  IS  get_token 
FROM  execute_program] 

' ,  among  them  ' 

CARD  [  ALL  F:  func__call  & 

SOURCE__TEXT  (F)  ==  'get_jtoken7 
AND  VALUE  (int) (AT  F  token  ==  ERROR) 
FROM  execute_program] 

'ERROR  tokens  detected7  ) ; 

The  CARD  operator  returns  the  number  of  items  selected 
by  the  aggregate  operation,  i.e.  the  number  of  events 
matching  the  pattern  in  the  aggregate  operation  body.  The 
ALL  option  in  the  aggregate  operation  indicates  that  all  nested 
events  of  the  type  f  unc_call  should  be  taken  into  account. 
The  pattern  in  the  third  aggregate  operation  provides  an 
example  of  a  complex  event  pattern  with  a  context  condition 
attached.  The  scope  of  this  trace  computation  is  the  entire 
program  trace.  After  execution  on  our  prototype  the 
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following  output  is  obtained. 

Total  function  calls  6802 

Total  function  get_source_line  calls  150 

Total  tokens  recognized  454,  among  them  37 
ERROR  tokens  detected 

Example  of  path  expression  checking. 

Regular  expressions  over  event  patterns  may  describe 
sequences  of  events  extracted  from  the  event  trace.  The 
following  assertion  checks  whether  function  get_token  and 
print_token  calls  appear  in  a  certain  order.  Sequence  of  events 
satisfying  the  pattern  X:func_call&  SOURCE_TEXT(X)  = 
‘get_token’  OR  SOURCEJTEXT(X)==‘print_token’  is 
selected  from  the  entire  event  trace  and  matched  against  the 
path  expression  (func_call  IS  ‘get_token’  func_call  IS 
‘print_token’)  +.  A  message  is  produced  with  information 
about  the  pattern  matching  results. 

[  X:  func__call  &  SOURCE_TEXT  (X)  == 
'get_token'  OR 

SOURCE_TEXT (X) == 
'print_token'  FROM  execute_j?rogram  ] 

SATISFIES ( func_call  IS  'get_token' 

func_call  IS  xprint_token'  )  + 

SAY (' function  calls  follow  the  pattern 

(get__token  print_token)  +  ' ) 

ONFAIL  SAY (  'pattern 

(get_token  print_token)  + 

is  violated'); 

Example  of  instrumenting  the  target  source  code  with 
print  statements. 

Suppose  we  want  to  insert  in  the  target  source  code  print 
statements  to  print  at  run  time  the  value  of  input  strings  with 
length  exceeding  1 0  and  corresponding  line  numbers.  Values 
of  interest  are  available  in  global  variables 
source_buf  f er  and  line_number,  respectively.  The 
following  debugging  rule  performs  this  function. 

WITHIN  get_source_line 

FOREACH  LI:  ex__stmt  IS  'Get_Line:' 

FROM  execute_j?rogram 
VALUE  (  int  ) 

(  AT  LI  strlen  (source_buffer)  >10? 


printf ( "long  line ! ! ! \n%s\n" , source_buf fer) :1) 

AND 

VALUE  (  int  ) 

(  AT  LI 

printf ( "line_number=%d\n" , line_number) ) ; 

END 

Formally  this  rule  will  cause  an  assertion  checking,  which 
will  be  successful  since  the  C  expression  involved  yields  a 
non-zero  value  (representing  Boolean  TRUE);  as  a  side  effect 
the  print  statements  are  executed  at  run  time.  This  debugging 
rule  has  two  aspects  worthy  of  notice.  First,  the 
instrumentation  code  is  separated  from  the  target  code;  it  will 
be  inserted  automatically  just  before  the  execution  and  can  be 
maintained  in  a  separate  file.  There  may  be  several  different 
print  instrumentations  defined  for  the  same  target  program; 
keeping  them  in  separate  files  provides  a  great  flexibility  in 
arranging  a  custom  set  of  print  statements  to  be  inserted  at  run 
time.  Second,  the  instrumentation  is  attached  to  a  particular 
event  in  the  trace  matching  the  pattern  ex^stmt  IS 
'Get_Line  :  ' ,  i.e.  traversal  of  the  label  Get_Line 
therefore  it  does  not  depend  on  possible  target  code 
modifications  as  long  as  the  label  is  not  changed. 

Debugging  rules  can  be  considered  as  a  way  of  formalizing 
reasoning  about  the  target  program  execution  —  humans  often 
use  similar  patterns  for  reasoning  when  debugging  programs. 
For  example,  if  the  index  expression  of  an  array  element  is 
out  of  range,  the  debugger  can  try  a  rule  for  eval-index  events 
that  invokes  another  rule  about  a  wrong  value  of  the  event 
eval-expression,  which  in  turn  will  cause  investigation  of 
histories  of  all  variables  included  in  the  expression. 

5  BRIEF  IMPLEMENTATION  SURVEY 

The  architecture  of  the  computations  over  the  event  traces 
for  the  C  programming  language  is  based  on  the  automatic 
instrumentation  of  the  target  program  source  code  in  such  a 
way  that  some  computations  over  the  trace  are  performed  at 
run  time  and  the  rest  of  information  is  saved  in  the  trace  file 
for  postmortem  processing.  The  instrumentation  does  not 
change  the  semantics  of  the  target  program.  The  trace  file  is 
read  by  the  FORMAN  interpreter  to  complete  the 
computations  over  the  trace  and  to  generate  messages.  A 
special  attempt  in  this  prototype  was  made  to  optimize  the 
trace  generation,  in  particular  to  filter  events  in  order  to 
reduce  the  size  the  trace. 

The  front  end  of  the  assertion  checker  was  adapted  and 
modified  from  Shawn’s  Flisakowski  parser  and  abstract 
syntax  tree  builder  for  the  complete  C  programming  language 
(gcc  version)  [14].  The  instrumentation  module  was  designed 
by  Ana  Erendira  Flores-Mendoza  as  her  Master’s  project  in 
the  NMSU  CS  Department  [15].  The  total  size  of  the  software 
used  for  the  prototype  amounts  to  more  then  20KLOC  of  C/ 
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lex/yacc/Rigal  [1]  code. 

Since  an  event  in  our  model  has  a  duration  and  may 
contain  another  events,  it  is  represented  on  the  trace  by  two 
records,  one  for  the  beginning  of  event  and  one  for  the  end. 
The  semantics  of  the  C  language  do  not  specify  the  order  of 
subexpression  execution;  to  address  this  issue  and  to  ensure 
proper  nesting  of  event  eval_expr  beginning  and  end  records 
on  the  trace  the  instrumented  code  maintains  some  auxiliary 
stack  for  expression  evaluation.  A  similar  stack  mechanism  is 
added  to  the  instrumented  code  to  maintain  proper  nesting  of 
ex_stmt  and  func_call  events  when  performing  return,  goto, 
and  break  statements.  These  specifics  of  our  target  program 
behavior  model  led  as  to  the  decision  to  implement  the 
instrumentation  module  from  the  scratch  rather  than  to  use 
some  generic  instrumentation  tools  like  [33].  The  basic 
building  block  for  expression  E  instrumentation  is  comma- 
expression  (el,  temp  =  E,  e2,  temp),  where  el  stands  for 
prologue  instrumentation,  e2  stands  for  epilog 
instrumentation,  and  temp  variable  holds  the  result  of  the 
original  expression  E  evaluation. 

Only  events  necessary  for  the  given  FORMAN  program 
are  involved  in  the  computations  over  the  trace  and  put  on  the 
trace.  For  the  Simple  Tokenizer  program  discussed  above, 
using  the  input  file  with  150  lines  and  454  tokens  and  the 
entire  set  of  debugging  rules  described  in  the  previous  section 
the  total  number  of  events  generated  by  the  target  program 
according  to  the  event  grammar  is  105,808,  although  only 
7253  of  them  (less  then  7%)  are  put  on  the  trace.  Even  in  its 
current  state  with  many  potential  optimizations  not  yet 
implemented,  the  prototype  demonstrates  the  feasibility  of 
trace  computations  for  “typical”  student  programs  like  the 
Simple  Tokenizer.  Our  experiments  with  other  C  programs 
show  that  storing  several  tens  of  thousands  of  events  on  the 
trace  is  sufficient  for  a  large  number  of  “typical”  C  programs 
run  with  a  set  of  debugging  rules  and  assertions  similar  to  the 
examples  in  Sec.  4.  It  should  be  noted  that  typically  the  size 
of  input  data  used  for  testing  and  debugging  purposes  is 
relatively  small. 

6  RELATED  WORK 

What  follows  is  a  very  brief  survey  of  basic  ideas  known 
in  Debugging  Automation  to  provide  the  background  for  the 
approach  advocated  in  this  paper. 

Event  Notion 

The  Event  Based  Behavioral  Abstraction  (EBBA)  method 
suggested  in  [7]  characterizes  the  behavior  of  the  entire 
program  in  terms  of  both  primitive  and  composite  events. 
Context  conditions  involving  event  attribute  values  can  be 
used  to  distinguish  events.  EBBA  defines  two  higher-level 
means  for  modeling  system  behavior  —  clustering  and 
filtering.  Clustering  is  used  to  express  behavior  as  composite 
events,  i.e.  aggregates  of  previously  defined  events.  Filtering 


serves  to  eliminate  from  consideration  events  which  are  not 
relevant  to  the  model  being  investigated.  Both  event 
recognition  and  filtering  can  be  performed  at  run-time. 

An  event-based  debugger  for  the  C  programming  language 
called  Dalek  [27]  provides  a  means  for  describing  user- 
defined  events  which  typically  are  points  within  a  program 
execution  trace.  A  target  program  has  to  be  instrumented  in 
order  to  collect  values  of  event  attributes.  Composite  events 
can  be  recognized  at  run-time  as  collections  of  primitive 
events. 

FORMAN  has  a  more  comprehensive  modeling  approach 
than  EBBA  or  Dalek,  based  on  the  event  grammar.  A 
language  for  expressing  computations  over  execution 
histories  is  provided,  which  is  missing  in  EBBA  and  Dalek. 
The  event  grammar  makes  FORMAN  suitable  for  automatic 
source  code  instrumentation  to  detect  all  necessary  events. 
FORMAN  supports  the  design  of  universal  assertions  and 
debugging  rules  that  could  be  used  for  debugging  of  arbitraiy 
target  programs.  This  generality  is  missing  in  the  EBBA  and 
Dalek  approaches.  The  event  in  FORMAN  is  a  time  interval, 
in  contrast  with  the  event  notion  in  previous  approaches 
where  events  are  considered  pointwise  time  moments. 

The  COCA  debugger  [13]  for  the  C  language  uses  the 
GDB  debugger  for  tracing  and  PROLOG  for  debugging 
queries  execution.  It  provides  a  certain  event  grammar  for  C 
traces  and  event  patterns  based  on  attributes  for  event  search. 
The  query  language  is  designed  around  special  primitives 
built  into  the  PROLOG  query  evaluator.  We  assume  that 
FORMAN  is  more  suitable  for  trace  computations  as  it  has 
been  designed  for  this  specific  purpose. 

Path  Expressions 

Data  and  control  flow  descriptions  of  the  target  program 
are  essential  for  testing  and  debugging  purposes.  It  is  useful 
to  give  such  a  description  in  an  explicit  and  precise  form.  The 
path  expression  technique  introduced  for  specifying  parallel 
programs  in  [11]  is  one  such  formalism.  Trace  specifications 
also  are  used  in  [26]  for  software  specification.  This 
technique  has  been  used  in  several  projects  as  a  background 
for  high-level  debugging  tools,  (e.g.  in  [10]),  where  path  rules 
are  suggested  as  a  kind  of  debugger  commands.  FORMAN 
provides  a  flexible  language  means  for  trace  specification 
including  event  patterns  and  regular  expressions  over  them. 

Assertion  Languages 

Assertion  (or  annotation)  languages  provide  yet  another 
approach  to  debugging  automation.  The  approaches  currently 
in  use  are  mostly  based  on  boolean  expressions  attached  to 
selected  points  of  the  target  program,  like  the  assert  macro  in 
C  [18].  The  ANNA  [23]  annotation  language  for  the  Ada 
target  language  supports  assertions  on  variable  and  type 
declarations.  In  the  TSL  [22],  [29]  annotation  language  for 
Ada  the  notion  of  event  is  introduced  in  order  to  describe  the 
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behavior  of  Tasks.  Patterns  can  be  written  which  involve 
parameter  values  of  Task  entry  calls.  Assertions  are  written  in 
Ada  itself,  using  a  number  of  special  pre-defined  predicates. 
Assertion-checking  is  dynamic  at  run-time,  and  does  not  need 
post-mortem  analysis.  The  RAPIDE  project  [24]  provides  an 
event-based  assertion  language  for  software  architecture 
description. 

In  [6]  events  are  introduced  to  describe  process 
communication,  termination,  and  connection  and  detachment 
of  process  to  channels.  A  language  of  Behavior  Expressions 
(BE)  is  provided  to  write  assertions  about  sequences  of 
process  interactions.  BE  is  able  to  describe  allowed 
sequences  of  events  as  well  as  some  predicates  defined  on  the 
values  of  the  variables  of  processes.  Event  types  are  process 
communication  and  interactions  such  as  send,  receive, 
terminate,  connect,  detach.  Evaluation  of  assertions  is  done  at 
run-time.  No  composite  events  are  provided. 

Another  experimental  debugging  tool  is  based  on  trace 
analysis  with  respect  to  assertions  in  temporal  interval  logic. 
This  work  is  presented  in  [20]  where  four  types  of  events  are 
introduced:  assignment  to  variables,  reaching  a  label, 
interprocess  communication  and  process  instantiation  or 
termination.  Composite  events  cannot  be  defined.  Different 
varieties  of  temporal  logic  languages  are  used  for  program 
static  analysis  called  Model  Checking  [12]. 

In  [30]  a  practical  approach  to  programming  with 
assertions  for  the  C  language  is  advocated,  and  it  is 
demonstrated  that  even  local  assertions  associated  with 
particular  points  within  the  program  may  be  extremely  useful 
for  program  debugging. 

The  DUEL  [19]  debugging  language  introduces 
expressions  for  C  aggregate  data  exploration,  for  both 
assertions  and  queries. 

The  FORMAN  language  for  computations  over  traces 
provides  a  flexible  means  for  writing  both  local  and  global 
assertions,  including  those  about  temporal  relations  between 
events. 

Algorithmic  Debugging 

The  original  algorithmic  program  debugging  method  was 
introduced  in  [32]  for  the  Prolog  language.  In  [31]  and  [16] 
this  paradigm  is  applied  to  a  subset  of  PASCAL.  The 
debugger  executes  the  program  and  builds  a  trace  execution 
tree  at  the  procedure  level  while  saving  some  useful  trace 
information  such  as  procedure  names  and  input/output 
parameter  values.  The  algorithmic  debugger  traverses  the 
execution  tree  and  interacts  with  the  user  by  asking  about  the 
intended  behavior  of  each  procedure.  The  user  has  the 
possibility  to  answer  “yes”  or  “no”  about  the  intended 
behavior  of  the  procedure.  The  search  finally  ends  and  a  bug 
is  localized  within  a  procedure  p  when  one  of  the  following 
holds:  procedure  p  contains  no  procedure  calls,  or  all 
procedure  calls  performed  from  the  body  of  procedure  p 


fulfill  the  user’s  expectations. 

Algorithmic  debugging  can  be  considered  as  an  example 
of  debugging  strategy,  based  on  some  assertion  language  (in 
this  case  assertions  about  results  of  a  procedure  call).  The 
notion  of  computation  over  execution  trace  introduced  in 
FORMAN  may  be  a  convenient  basis  for  describing  such 
debugging  strategies. 

7  CONCLUSIONS 

In  brief,  our  approach  can  be  explained  as  “computations 
over  a  target  program  event  trace  based  on  a  precise  program 
behavior  model”.  According  to  [8]  and  [28],  approximately 
40-50%  of  all  bugs  detected  during  the  program  testing  are 
logic,  structural,  and  functionality  bugs,  i.e.,  bugs  which 
could  be  detected  by  appropriate  assertion  checking  similar  to 
that  demonstrated  above. 

We  expect  the  advantages  of  our  approach  to  be  the 
following: 

•  The  notion  of  an  event  grammar  provides  a  general 
basis  for  program  behavior  models.  In  contrast  with  pre¬ 
vious  approaches,  the  event  is  not  a  point  in  the  trace  but 
an  interval  with  a  beginning  and  an  end. 

•  Event  grammar  provides  a  coordinate  system  to  refer  to 
any  interesting  event  in  the  execution  history.  Event 
attributes  provide  complete  access  to  each  target  pro¬ 
gram’s  execution  state.  Assertions  about  particular  exe¬ 
cution  states  as  well  as  assertions  about  sets  of  different 
execution  states  may  be  checked. 

•  The  IN  relation  yields  a  hierarchy  of  events,  so  the 
assertions  can  be  defined  at  an  appropriate  level  of  granu¬ 
larity. 

•  A  language  for  computations  over  event  traces  pro¬ 
vides  a  uniform  framework  for  assertion  checking,  pro¬ 
files,  debugging  queries,  and  performance  measurements. 

•  The  fact  that  assertions  and  other  computations  over  the 
target  program  event  trace  can  be  separated  from  the 
text  of  the  target  program  allows  accumulation  of  for¬ 
malized  knowledge  about  particular  programs  and  makes 
it  easy  to  control  the  number  of  assertions  to  be  checked. 

The  first  experiments  with  our  C  assertion  checker 
prototype  prove  that: 

•  instrumentation  of  the  C  source  code  may  be  an  appropri¬ 
ate  technique  for  automatic  testing  and  debugging  tool 
design, 

•  event  filtering  can  reduce  the  size  of  the  stored  event 
trace  to  5-20%  of  the  total  trace, 

•  the  size  of  the  stored  event  trace  could  be  kept  within 
reasonable  limits  (several  tens  of  thousands  of  events)  for 
realistic  C  programs. 

The  future  work  will  be  dedicated  to  further  optimizations 
of  trace  computation  and  event  filtering,  and  to  the  design  of 
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an  appropriate  user  interface. 
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ABSTRACT 

This  paper  suggests  an  approach  to  the  development  of 
software  testing  and  debugging  automation  tools  based  on 
precise  program  behavior  models.  The  program  behavior 
model  is  defined  as  a  set  of  events  (event  trace)  with  two  basic 
binary  relations  over  events  -  precedence  and  inclusion,  and 
represents  the  temporal  relationship  between  actions.  A 
language  for  the  computations  over  event  traces  is  developed 
that  provides  a  basis  for  assertion  checking,  debugging 
queries,  execution  profiles,  and  performance  measurements. 

The  approach  is  nondestructive,  since  assertion  texts  are 
separated  from  the  target  program  source  code  and  can  be 
maintained  independently.  An  event  grammar  provides  a 
sound  basis  for  assertion  language  implementation  via  target 
program  automatic  instrumentation.  Preliminary  experiments 
with  a  prototype  assertion  checker  for  the  C  programming 
language  are  discussed. 

1  INTRODUCTION 

Program  testing  and  debugging  is  still  a  human  activity 
performed  largely  without  any  adequate  tools,  and  consum¬ 
ing  more  than  50%  of  the  total  program  development  time 
and  effort  [8].  Testing  and  debugging  are  mostly  concerned 
with  the  program  run-time  behavior,  and  developing  a  pre¬ 
cise  model  of  program  behavior  becomes  the  first  step 
towards  any  dynamic  analysis  automation.  In  building  such  a 
model  several  considerations  were  taken  in  account.  The  first 
assumption  we  make  is  that  the  model  is  discrete,  i.e.  com¬ 
prises  a  finite  number  of  well-separated  elements.  For  this 
reason  the  notion  of  event  as  an  elementary  unit  of  action  is 
an  appropriate  basis  for  building  the  whole  model.  The  event 
is  an  abstraction  for  any  detectable  action  performed  during 
the  program  execution,  such  as  a  statement  execution, 
expression  evaluation,  procedure  call,  sending  and  receiving 
a  message,  etc. 

Actions  (or  events)  are  evolving  in  time  and  the  program 
behavior  represents  the  temporal  relationship  between 
actions.  This  implies  the  necessity  to  introduce  an  ordering 
relation  for  events.  Semantics  of  parallel  programming 
languages  and  even  some  sequential  languages  (such  as  C)  do 
not  require  the  total  ordering  of  actions,  so  partial  event 


ordering  is  the  most  adequate  for  this  purpose  [19]. 

Actions  performed  during  the  program  execution  are  at 
different  levels  of  granularity,  some  of  them  include  other 
actions,  e.g.  a  subroutine  call  event  contains  statement  exe¬ 
cution  events.  This  consideration  brings  to  our  model  inclu¬ 
sion  relation.  Under  this  relationship,  events  can  be 
hierarchical  objects  and  it  becomes  possible  to  consider  pro¬ 
gram  behavior  at  appropriate  levels  of  granularity. 

An  event  may  have  a  type  and  some  other  attributes,  such 
as  event  duration,  program  source  code  related  to  the  event, 
program  state  associated  with  the  event  (i.e.  program  variable 
values  at  the  beginning  and  at  the  end  of  the  event),  etc.  This 
program  behavior  model  may  be  regarded  as  a  “lightweight” 
semantics  of  the  programming  language. 

The  next  problem  to  be  addressed  after  the  program 
behavior  model  is  set  up  is  the  formalism  for  specifying 
properties  of  the  program  behavior.  This  could  be  done  in 
many  different  ways,  e.g.,  by  adopting  some  kind  of  logic 
calculi  (predicate  logic,  temporal  logic).  Such  a  direction 
leads  to  tools  for  static  program  verification,  such  as  an 
approach  called  model  checking  [11], 

Since  our  goal  is  dynamic  program  analysis  that  requires 
different  types  of  assertion  checking,  debugging  queries, 
program  execution  profiles,  and  so  on,  we  developed  the 
concept  of  a  computation  over  the  event  trace.  It  seems  that 
this  concept  is  general  enough  to  cover  all  the  above 
mentioned  needs  in  the  unifying  framework,  and  provides 
sufficient  flexibility.  This  approach  implies  the  design  of  a 
special  programming  language  for  computations  over  the 
event  traces.  We  suggest  a  particular  language  called 
FORMAN  ([2],  [16])  based  on  a  functional  paradigm  and  the 
use  of  event  patterns  and  aggregate  operations  over  events. 
The  papers  [1],  [2],  [16]  are  based  on  our  assertion  checker 
prototype  for  a  subset  of  the  PASCAL  language.  This  paper 
describes  the  first  experience  with  an  assertion  checker  for  the 
C  programming  language.  The  implementation  of  the  C 
assertion  checker  is  based  on  source  code  automatic 
instrumentation.  To  adjust  to  the  specifics  of  the  C  target 
language  the  FORMAN  language  has  been  modified,  in 
particular,  the  scope  construct  (WITHIN  function-name)  and 
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explicit  type  cast  have  been  added  (see  examples  in  Sec.  4). 

Patterns  describe  the  structure  of  events  with  context 
conditions.  Program  paths  can  be  described  by  path 
expressions  over  events.  All  this  makes  it  possible  to  write 
assertions  not  only  about  variable  values  at  program  points 
but  also  about  data  flow  and  control  flow  in  the  target 
program. 

Possible  applications  of  a  language  for  computations  over 
a  program  event  trace  include  program  testing  and  debugging, 
performance  measurement  and  modeling,  program  profiling, 
program  animation,  program  maintenance  and  program 
documentation  [4].  Even  the  traditional  debugging  method 
based  on  scattering  print  statements  across  the  source  code 
may  be  easily  implemented  as  an  appropriate  computation  on 
the  event  trace  (see  example  in  Sec  4).  The  advantage  is  that 
the  print  statements  are  kept  in  a  separate  file  and  the  source 
code  of  the  target  program  will  be  instrumented  automatically 
just  before  execution.  A  study  of  applying  FORMAN  to 
parallel  programming  is  presented  in  [3]. 

2  EVENTS 

FORMAN  is  based  on  a  semantic  model  of  target  program 
behavior  in  which  the  program  execution  is  represented  by  a 
set  of  events.  An  event  occurs  when  some  action  is  performed 
during  the  program  execution  process.  For  instance,  a 
function  is  called,  a  statement  is  executed,  or  some  expression 
is  evaluated.  A  particular  action  may  be  performed  many 
times,  but  every  execution  of  an  action  is  denoted  by  a  unique 
event. 

Every  event  defines  a  time  interval  which  has  a  beginning 
and  an  end.  For  atomic  events,  the  beginning  and  end  points 
of  the  time  interval  will  be  the  same.  All  events  used  for 
assertion  checking  and  other  computations  over  event  traces 
must  be  detectable  by  some  implementation  (e.g.  by  an 
appropriate  target  program  instrumentation.)  Attributes 
attached  to  events  bring  additional  information  about  event 
context,  such  as  current  variable  and  expression  values. 

In  order  to  give  some  rationale  for  our  notion  of  an  event, 
let  us  consider  a  well-known  idea  such  as  a  counter.  Usually 
the  history  of  a  variable  X  when  used  as  a  counter  looks  like: 

X  :  =  0  ;  ... 

Loop  .  .  . 

X  :  *  X  +  1 ;  ... 

endloop;  . .  . 

In  order  to  determine  whether  the  actual  behavior  of  the 
counter  X  matches  the  pattern  described  by  the  program 
fragment  above  we  have  to  consider  the  following  events.  Let 
Initialize_X  denotes  the  event  of  assigning  0  to  the  variable 
X,  Augment_X  denotes  the  event  of  incrementing  X,  and 
Assign_X  denotes  the  event  of  assigning  any  value  to  the 
variable  X.  The  event  Assign _X  is  a  composite  one;  it 


contains  either  Initialize_X  or  Augment_X  events.  One  could 
determine  if  X  behaves  as  a  counter  when  a  program  segment 
S  is  executed  in  the  following  way.  First,  the  sequence  A  of 
all  events  of  the  type  Assign_X  from  the  event  trace  of 
program  segment  S  has  to  be  extracted  preserving  the 
ordering  between  events.  Second,  A  has  to  be  matched  with 
the  pattern: 

Initialize_X  (Augment_X) * 

where  denotes  repetition  zero  or 

more  times.  If  the  actual  sequence  of 
events  does  not  match  this  pattern  we  can 
report  an  error.  Therefore,  assertion 
checking  can  be  represented  as  a  kind  of 
computation  over  a  target  program  event 
trace . 

The  program  state  (current  values  of  variables)  can  be 
considered  at  the  beginning  or  at  the  end  of  an  appropriate 
event.  This  provides  the  opportunity  to  write  assertions  about 
program  variable  values  at  different  points  in  the  program 
execution  history. 

3  PROGRAM  BEHAVIOR  MODEL 

FORMAN  is  intended  to  be  used  to  specify  behavior  of 
programs  written  in  some  high-level  programming  language 
which  is  called  the  target  language.  The  model  of  target 
program  behavior  is  formally  defined  as  a  set  of  events  {event 
trace)  with  two  basic  relations,  which  may  or  may  not  hold 
between  two  arbitrary  events.  The  events  may  be  sequentially 
ordered  (PRECEDES),  or  one  of  them  might  be  included  in 
another  composite  event  (IN).  For  each  pair  of  events  in  the 
event  trace  no  more  than  one  of  these  relations  can  be 
established. 

In  order  to  define  the  behavior  model  for  a  particular  target 
language,  types  of  events  are  introduced.  Each  event  belongs 
to  one  or  more  of  predefined  event  types,  which  are  induced 
by  target  language  abstract  syntax  (e.g.  execute-statement, 
send-message,  receive-message)  or  by  target  language 
semantics  (e.g.,  rendezvous,  wait,  put-message-in-queue). 

The  target  program  execution  model  is  defined  by  an  event 
grammar.  The  event  may  be  a  compound  object,  in  which 
case  the  grammar  describes  how  the  event  is  split  into  other 
event  sequences  or  sets.  The  event  grammar  is  a  set  of  axioms 
that  describe  possible  patterns  of  basic  relations  between 
events  of  different  types  in  the  program  execution  history;  it 
is  not  intended  to  be  used  for  parsing  an  actual  event  trace. 

The  rule  A  :  :  B  C  establishes  that  if  an  event  a  of  the 
type  A  occurs  in  the  trace  of  a  program,  it  is  necessary  that 
events  b  and  c  of  types  B  and  C  also  exist,  such  that  the 
relations  b  IN  a,  c  IN  a,  b  PRECEDES  c  hold. 

For  the  C  language  assertion  checker  prototype  we  have 
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defined  the  following  simple  event  grammar. 

(Axiom  1)  execute_program : : 

(  ex__stmt  |  eval__expr  )  * 
(Axiom  2)  ex_stmt :  : 

(  ex_stmt  |  eval_expr  ) * 
(Axiom  3)  eval_expr::  func_call  | 

eval__expr+  destination?  | 

{  eval_expr  }  + 

(Axiom  4)  f unc _ call : : 

{  eval_expr  }*  ex_stmt* 

Axiom  1  states  that  the  program  execution  event  contains 
(the  IN  relation)  a  set  of  zero  or  more  ordered  (w.r.t.  relation 
PRECEDES)  events  of  the  types  execute-statement  or 
evaluate-expression. 

Axiom  2  states  the  same  fact  about  the  execute_statement 
event.  For  example,  the  event  of  executing  a  composite 
statement  such  as  if-then-else  will  contain  an  event 
eval_expr  for  condition  evaluation  and  a  sequence  of  zero 
or  more  events  for  the  corresponding  THEN  or  ELSE  branch 
execution.  If  a  statement  has  a  label  attached,  the  label 
traversal  itself  is  considered  as  an  empty  statement  execution 
event. 

Axiom  3  describes  the  possible  structure  of  an  expression 
evaluation  event:  it  may  contain  a  function  call  event  or  may 
be  an  ordered  sequence  of  other  expression  evaluation  events 
(e.g.  for  a  ‘comma”  expression).  The  assignment  expression 
evaluation  contains  the  event  destination  which  is 
distinguished  because  it  is  of  a  special  importance  for 
assertion  checking.  In  our  implementation  we  have  avoided 
any  assumptions  about  the  ordering  of  argument  evaluation 
for  binary  operations,  such  as  or  ‘*\  since  the  C  language 
semantics  leaves  this  undefined  [17].  The  grammar  rule 
{ eval__expr }  +  denotes  a  set  of  one  or  more  events  of  the 
type  eval_expr  without  any  ordering  relationship. 

Axiom  4  describes  the  structure  of  a  function  call  event 
which  starts  with  a  set  (may  be  empty)  of  unordered  events 
for  actual  parameter  evaluation  followed  by  the  function  body 
execution  events. 

The  order  of  event  occurrences  reflects  the  semantics  of 
the  target  language.  When  performing  an  assignment 
statement,  first  the  right-hand  part  is  evaluated  and  after  this 
the  destination  event  occurs  (which  denotes  the  assignment 
event  itself).  The  event  grammar  makes  FORMAN  suitable 
for  automatic  source  code  instrumentation  to  detect  all 
necessary  events. 

An  event  has  attributes,  such  as  the  source  text  fragment 
from  the  corresponding  target  program,  current  values  of 
target  program  variables  and  expressions  at  the  beginning  and 
at  the  end  of  event,  the  duration  of  the  event,  a  previous  path 


(i.e.  set  of  events  preceding  the  event  in  the  target  program 
execution  history),  etc. 

FORMAN  supplies  a  means  for  writing  assertions  about 
events  and  event  sequences  and  sets.  These  include 
quantifiers  and  other  aggregate  operations  over  events,  e.g., 
sequence,  bag  and  set  constructors,  boolean  operations  and 
operations  of  the  target  language  to  write  assertions  about 
target  program  variables. 

Events  can  be  described  by  patterns  which  capture  the 
structure  of  event  and  context  conditions.  Program  paths  can 
be  described  by  regular  path  expressions  over  events. 

4  EXAMPLES  OF  DEBUGGING  RULES 

In  general,  a  debugging  rule  performs  some  actions  that 
may  include  computations  over  the  target  program  event 
trace.  The  aim  is  to  generate  informative  messages  and  to 
provide  the  user  with  some  values  obtained  from  the  trace  in 
order  to  detect  and  localize  bugs.  An  assertion  is  a  boolean 
expression  that  may  contain  quantifiers  and  sequencing 
constraints  over  events. 

Assertions  can  be  used  as  conditions  in  the  rules 
describing  actions  that  can  be  performed  if  an  assertion  is 
satisfied  or  violated.  A  debugging  rule  has  the  form: 

assertion  SAY  (expression  sequence) 

ONFAIL  SAY  (expression  sequence) 

We  will  use  as  an  example  of  a  C  program  the  Simple 
Tokenizer  program  described  in  [23].  This  program  reads  a 
text  file  until  the  special  symbol  V  (dot)  is  read,  recognizes 
small  integers,  identifiers,  and  some  predefined  key  words, 
skips  spaces  and  PASCAL-like  comments,  prints  the  input 
text  with  line  numbers  attached  before  each  line,  splits  the 
output  into  pages  with  a  page  header  on  the  top  of  each  page 
(including  page  number),  and  reports  each  token  recognized. 
Unrecognized  symbols  are  printed  as  ERROR  tokens.  The 
source  code  contains  542  lines  of  code  (including  some  of  our 
updates  and  comments).  The  input  file  used  for  running  the 
following  examples  contained  150  iines  of  text  with  a  total  of 
454  tokens.  The  corresponding  output  contained  13  pages 
with  maximum  of  50  lines  per  page  (including  the  input  lines 
and  messages  about  tokens  recognized,  each  on  a  separate 
line  of  output). 

Example  of  a  debugging  query. 

In  order  to  obtain  the  history  of  a  global  variable 
Pa  g e_numb  e  r  the  following  computation  over  the  event 
trace  can  be  performed.  The  WITHIN  construct  indicates  the 
scope  of  the  trace  computations  defined  by  this  rule.  The  rule 
condition  is  TRUE,  and  as  a  side  effect  the  entire  history  of 
variable  page_number  is  shown.  The  [  ...  ]  list 

constructor  defines  a  loop  over  the  entire  program  event  trace 
(execute_program  event).  All  events  matching  the 
pattern  funereal  1  IS  print f  executed  within  the 
body  of  print_page_Jieader  function  are  selected  from 
the  trace  and  the  function  VALUE  is  applied  to  them.  The 
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metavariable  C  holds  the  event  func_call  under 
consideration.  The  resulting  sequence  consists  of  variable 
page_number  values  at  the  end  of  each  event  captured  by 
metavariable  C  during  the  program  execution. 

WITHIN  print _J?age_header 
TRUE 

SAY (  'The  history  of  page_number 
variable  values  is :  ' 

[  C:  func_call  IS  printf 

FROM  execute_program 

APPLY  VALUE  (int)  (AT  C  page_number)  ]  )  ; 

END 

When  executed  on  our  prototype  the  following  output  is 
produced: 

The  history  of  page_number  variable  values 
is:  1  2  3  4  5  6  7  8  9  10  11  12  13 

This  debugging  rule  provides  a  slice  of  the  program 
execution  history  containing  the  trace  of  particular  variable 
values.  The  matter  of  interest  may  be,  for  instance,  to  check 
whether  the  values  in  the  variable  history  are  arranged  in 
ascending  order. 

Example  of  an  assertion  checking. 

Let  us  write  and  check  the  assertion:  “There  exists  an  input 
line  with  length  exceeding  some  maximum ,  say  10”  The 
program  snippet  containing  the  function 
get_source_line  looks  like: 

BOOLEAN  get_source_line() 

{char  print_buffer[MAX_SOURCE_LINE_LENGTH  +9  ]; 
if ( (fgets (sourcejbuf fer, 

MAX_SOURCE_LINE_LENGTH , 
source_f ile) )  !  =  NULL)  { 

++line_number ; 

Get__Line : 

sprintf (print_buf fer,  "%4d  %d:  %s77, 
line_number, level, source_buf fer) ; 
print_line (printjbuf f er) ; 
return (TRUE) ; 

} 

else  return (FALSE) ;  } 

Traversal  of  a  label  is  an  event  of  the  type  ex_s  tint  #  and 
we  can  check  the  value  of  a  C  expression 


strlen  (source_buf  f  er)  >  10  after  this  event. 
WITHIN  get_source_line 

EXISTS  L:  ex_stmt  IS  'Get_Line:7 

FROM  execute_program 

VALUE (int) (AT  L  strlen (source_buf fer )  >10) 
SAY ( ' Too  long  input  line  detected  at  stmt7  ) 
SAY  (L) 

SAY (  'It  is  ' 

VALUE (int) (AT  L  strlen (source_buf fer) ) 
'characters  long7) 

ONFAIL  SAY ( '  No  long  input  lines  detected7); 

We  check  whether  the  expression 
strlen  (source_buf  fer)  >  1 0  is  not  equal  to  0  for  all 
events  L.  When  the  assertion  is  satisfied  for  the  first  time,  the 
assertion  evaluation  terminates  and  the  current  value  of  the 
metavariable  L  can  be  used  for  message  output.  In  order  to 
make  error  messages  more  informative,  the  value  of  a 
metavariable  when  printed  by  the  SAY  clause  is  shown  in  the 
form: 

event-type:>  event -source -text 
source_l ine_number 
within  functionjiame 

Time=  event -begin- time  . .  event -end-time 

Event  begin  and  end  times  in  this  prototype 
implementation  are  simply  values  of  the  step  counter. 

When  executed  on  our  prototype  this  assertion  checking 
yields  the  following  output. 

Too  long  input  line  detected  at  stmt 

ex_stmt  : >  'Get_Line:7  source  line  460 
within  function  get_source_line 

Time=  95  . .  96 

It  is  20  characters  long 

Example  of  a  run  time  statistics  gathering. 

It  is  hard  to  measure  real  execution  time  of  a  heavily 
instrumented  target  program,  although  the  simulated  time 
measurement  may  be  performed  given  that  events  may  have 
some  duration  attributes  predefined.  In  order  to  obtain  the 
actual  number  of  function  calls  executed,  number  of  function 
get_source_line  calls,  and  number  of  tokens 
recognized  by  the  Simple  Tokenizer,  the  following  query  can 
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be  performed: 

TRUE 

SAY ('Total  function  calls' 

CARD [  ALL  func_call 

FROM  execute_program] ) 

SAY ('Total  function  get_source_line  calls' 
CARD  [  func_call  IS  get_source  line 
FROM  execute_program] ) 

SAY ('Total  tokens  recognized' 

CARD  [  ALL  func_call  IS  get_token 
FROM  execute_program] 

' ,  among  them  ' 

CARD  [  ALL  F:  func_call  & 
SOURCE_TEXT(F)  ==  'getjtoken' 

AND  VALUE  (int) (AT  F  token  ==  ERROR) 
FROM  execute_program] 
'ERROR  tokens  detected'  ) ; 

The  CARD  operator  returns  the  number  of  items  selected 
by  the  aggregate  operation,  i.e.  the  number  of  events 
matching  the  pattern  in  the  aggregate  operation  body.  The 
ALL  option  in  the  aggregate  operation  indicates  that  all  nested 
events  of  the  type  f unc_call  should  be  taken  into  account. 
The  pattern  in  the  third  aggregate  operation  provides  an 
example  of  a  complex  event  pattern  with  a  context  condition 
attached.  The  scope  of  this  trace  computation  is  the  entire 
program  trace.  After  execution  on  our  prototype  the 
following  output  is  obtained. 

Total  function  calls  6802 

Total  function  get_source_line  calls  150 

Total  tokens  recognized  454  ,  among  them  37 
ERROR  tokens  detected 

Example  of  path  expression  checking. 

Regular  expressions  over  event  patterns  may  describe 
sequences  of  events  extracted  from  the  event  trace.  The 
following  assertion  checks  whether  function  get  token 
andprint_token  calls  appear  in  a  certain  order.  Sequence 
of  events  satisfying  the  pattern  X :  func__call& 
SOURCE_TEXT  (X)  ==  'get_token;  OR 

SOURCE  JTEXT  (X)  ==  'print_token '  is  selected  from 
the  entire  event  trace  and  matched  against  the  path  expression 
(func_call  IS  'get_token/  func  call  is 
'print_token' )  +.  A  message  is  produced  with 

information  about  the  pattern  matching  results. 

[  X:  func_call  &  SOURCE_TEXT (X) == 
'getjtoken'  OR 

SOURCE_TEXT (X) == 


'print__token'  FROM  execute_program  ] 

SATISFIES (func_call  IS  'get_token' 
func_call  IS  'print_token'  )  + 

SAY (  function  calls  follow  the  pattern 
(get_token  print_token)  +  ' ) 

ONFAIL  SAY (  'pattern 

(getjtoken  print_token)  + 
is  violated' ) ; 

Example  of  instrumenting  the  target  source  code  with 
print  statements.  Suppose  we  want  to  insert  in  the  target 
source  code  print  statements  to  print  at  run  time  the  value  of 
input  strings  with  length  exceeding  10  and  corresponding  line 
numbers.  Values  of  interest  are  available  in  global  variables 
source_buf  f  er  and  line^number,  respectively.  The 
following  debugging  rule  performs  this  function. 

WITHIN  get_source_line 

FOREACH  LI:  ex_stmt  IS  'Get_Line:' 

FROM  execute_program 
VALUE  (  int  ) 

(  AT  LI  strlen (source_buf f er) >10? 
printf ( "long  line ! ! I \n%s\n" , source_buf f er ) : 1 ) 

AND 

VALUE  (  int  ) 

(  AT  LI 

printf  (  "line_number=%d\n"  ,  line_number)  )  ; 

END 

Formally  this  rule  will  cause  an  assertion  checking,  which 
will  be  successful  since  the  C  expression  involved  yields  a 
non-zero  value  (representing  Boolean  TRUE);  as  a  side  effect 
the  print  statements  are  executed  at  run  time.  This  debugging 
rule  has  two  aspects  worthy  of  notice.  First,  the 
instrumentation  code  is  separated  from  the  target  code;  it  will 
be  inserted  automatically  just  before  the  execution  and  can  be 
maintained  in  a  separate  file.  There  may  be  several  different 
print  instrumentations  defined  for  the  same  target  program; 
keeping  them  in  separate  files  provides  a  great  flexibility  in 
arranging  a  custom  set  of  print  statements  to  be  inserted  at  run 
time.  Second,  the  instrumentation  is  attached  to  a  particular 
event  in  the  trace  matching  the  pattern  ex_stmt  IS 
'Get_Line:',  i.e.  traversal  of  the  label  Get_Line.-, 
therefore  it  does  not  depend  on  possible  target  code 
modifications  as  long  as  the  label  is  not  changed. 

5  BRIEF  IMPLEMENTATION  SURVEY 

The  architecture  of  the  computations  over  the  event  traces 
for  the  C  programming  language  is  based  on  the  automatic 
instrumentation  of  the  target  program  source  code  in  such  a 
way  that  some  computations  over  the  trace  are  performed  at 
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run  time  and  the  rest  of  information  is  saved  in  the  trace  file 
for  postmortem  processing.  The  instrumentation  does  not 
change  the  semantics  of  the  target  program.  The  trace  file  is 
read  by  the  FORMAN  interpreter  to  complete  the 
computations  over  the  trace  and  to  generate  messages.  A 
special  attempt  in  this  prototype  was  made  to  optimize  the 
trace  generation,  in  particular  to  filter  events  in  order  to 
reduce  the  size  the  trace. 

The  front  end  of  the  assertion  checker  was  adapted  and 
modified  from  Shawn’s  Flisakowski  parser  and  abstract 
syntax  tree  builder  for  the  complete  C  programming  language 
(gcc  version)  [13].  The  instrumentation  module  was  designed 
by  Ana  Erendira  Flores-Mendoza  as  her  Master’s  project  in 
the  NMSU  CS  Department  [14].  The  total  size  of  the  software 
used  for  the  prototype  amounts  to  more  then  20KLOC  of  C/ 
lex/yacc  code. 

Since  an  event  in  our  model  has  a  duration  and  may 
contain  another  events,  it  is  represented  on  the  trace  by  two 
records,  one  for  the  beginning  of  event  and  one  for  the  end. 
The  semantics  of  the  C  language  do  not  specify  the  order  of 
subexpression  execution;  to  address  this  issue  and  to  ensure 
proper  nesting  of  event  eval_expr  beginning  and  end 
records  on  the  trace  the  instrumented  code  maintains  some 
auxiliary  stack  for  expression  evaluation.  A  similar  stack 
mechanism  is  added  to  the  instrumented  code  to  maintain 
proper  nesting  of  ex_stmt  and  func_call  events  when 
performing  return,  goto,  and  break  statements.  These 
specifics  of  our  target  program  behavior  model  led  as  to  the 
decision  to  implement  the  instrumentation  module  from  the 
scratch  rather  than  to  use  some  generic  instrumentation  tools 
like  [31]. 

Only  events  necessary  for  the  given  FORMAN  program 
are  involved  in  the  computations  over  the  trace  and  put  on  the 
trace.  For  the  Simple  Tokenizer  example  discussed  above, 
using  the  input  file  with  150  lines  and  454  tokens  and  the 
entire  set  of  debugging  rules  described  in  the  previous  section 
the  total  number  of  events  generated  by  the  target  program 
according  to  the  event  grammar  is  105,808,  although  only 
7253  of  them  (less  then  7%)  are  put  on  the  trace.  Even  in  its 
current  state  with  many  potential  optimizations  not  yet 
implemented,  the  prototype  demonstrates  the  feasibility  of 
trace  computations  for  “typical”  student  programs  like  the 
Simple  Tokenizer.  Our  experiments  show  that  storing  several 
tens  of  thousands  of  events  on  the  trace  is  sufficient  for 
“typical”  C  programs  run  with  a  set  of  debugging  rules  and 
assertions  similar  to  the  examples  in  Sec.  4.  It  should  be  noted 
that  typically  the  size  of  input  data  used  for  testing  and 
debugging  purposes  is  relatively  small. 

6  RELATED  WORK 

What  follows  is  a  very  brief  survey  of  basic  ideas  known 
in  Debugging  Automation  to  provide  the  background  for  the 


approach  advocated  in  this  paper. 

Event  Notion 

The  Event  Based  Behavioral  Abstraction  (EBBA)  method 
suggested  in  [6]  characterizes  the  behavior  of  the  entire 
program  in  terms  of  both  primitive  and  composite  events. 
Context  conditions  involving  event  attribute  values  can  be 
used  to  distinguish  events.  EBBA  defines  two  higher-level 
means  for  modeling  system  behavior  -  clustering  and 
filtering.  Clustering  is  used  to  express  behavior  as  composite 
events,  i.e.  aggregates  of  previously  defined  events.  Filtering 
serves  to  eliminate  from  consideration  events  which  are  not 
relevant  to  the  model  being  investigated.  Both  event 
recognition  and  filtering  can  be  performed  at  run-time. 

An  event-based  debugger  for  the  C  programming  language 
called  Dalek  [25]  provides  a  means  for  describing  user- 
defined  events  which  typically  are  points  within  a  program 
execution  trace.  A  target  program  has  to  be  instrumented  in 
order  to  collect  values  of  event  attributes.  Composite  events 
can  be  recognized  at  run-time  as  collections  of  primitive 
events. 

FORMAN  has  a  more  comprehensive  modelling  approach 
than  EBBA  or  Dalek,  based  on  the  event  grammar.  A 
language  for  expressing  computations  over  execution 
histories  is  provided,  which  is  missing  in  EBBA  and  Dalek. 
The  event  grammar  makes  FORMAN  suitable  for  automatic 
source  code  instrumentation  to  detect  all  necessary  events. 
FORMAN  supports  the  design  of  universal  assertions  and 
debugging  rules  that  could  be  used  for  debugging  of  arbitrary 
target  programs.  This  generality  is  missing  in  the  EBBA  and 
Dalek  approaches.  The  event  in  FORMAN  is  a  time  interval, 
in  contrast  with  the  event  notion  in  previous  approaches 
where  events  are  considered  pointwise  time  moments. 

The  COCA  debugger  [12]  for  the  C  language  uses  the 
GDB  debugger  for  tracing  and  PROLOG  for  debugging 
queries  execution.  It  provides  a  certain  event  grammar  for  C 
traces  and  event  patterns  based  on  attributes  for  event  search. 
The  query  language  is  designed  around  special  primitives 
built  into  the  PROLOG  query  evaluator.  We  assume  that 
FORMAN  is  more  suitable  for  trace  computations  as  it  has 
been  designed  for  this  specific  purpose. 

Path  Expressions 

Data  and  control  flow  descriptions  of  the  target  program 
are  essential  for  testing  and  debugging  purposes.  It  is  useful 
to  give  such  a  description  in  an  explicit  and  precise  form.  The 
path  expression  technique  introduced  for  specifying  parallel 
programs  in  [10]  is  one  such  formalism.  Trace  specifications 
also  are  used  in  [24]  for  software  specification.  This 
technique  has  been  used  in  several  projects  as  a  background 
for  high-level  debugging  tools,  (e.g.  in  [9]),  where  path  rules 
are  suggested  as  a  kind  of  debugger  commands.  FORMAN 
provides  a  flexible  language  means  for  trace  specification 
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including  event  patterns  and  regular  expressions  over  them. 

Assertion  Languages 

Assertion  (or  annotation)  languages  provide  yet  another 
approach  to  debugging  automation.  The  approaches  currently 
in  use  are  mostly  based  on  boolean  expressions  attached  to 
selected  points  of  the  target  program,  like  the  assert  macro  in 
C  [17].  The  ANNA  [21]  annotation  language  for  the  Ada 
target  language  supports  assertions  on  variable  and  type 
declarations.  In  the  TSL  [20],  [27]  annotation  language  for 
Ada  the  notion  of  event  is  introduced  in  order  to  describe  the 
behavior  of  Tasks.  Patterns  can  be  written  which  involve 
parameter  values  of  Task  entry  calls.  Assertions  are  written  in 
Ada  itself,  using  a  number  of  special  pre-defined  predicates. 
Assertion-checking  is  dynamic  at  run-time,  and  does  not  need 
post-mortem  analysis.  The  RAPIDE  project  [22]  provides  an 
event-based  assertion  language  for  software  architecture 
description. 

In  [5]  events  are  introduced  to  describe  process 
communication,  termination,  and  connection  and  detachment 
of  process  to  channels.  A  language  of  Behavior  Expressions 
(BE)  is  provided  to  write  assertions  about  sequences  of 
process  interactions.  BE  is  able  to  describe  allowed 
sequences  of  events  as  well  as  some  predicates  defined  on  the 
values  of  the  variables  of  processes.  Event  types  are  process 
communication  and  interactions  such  as  send,  receive, 
terminate,  connect,  detach.  Evaluation  of  assertions  is  done  at 
run-time.  No  composite  events  are  provided. 

Another  experimental  debugging  tool  is  based  on  trace 
analysis  with  respect  to  assertions  in  temporal  interval  logic. 
This  work  is  presented  in  [18]  where  four  types  of  events  are 
introduced:  assignment  to  variables,  reaching  a  label, 
interprocess  communication  and  process  instantiation  or 
termination.  Composite  events  cannot  be  defined.  Different 
varieties  of  temporal  logic  languages  are  used  for  program 
static  analysis  called  Model  Checking  [1 1]. 

In  [28]  a  practical  approach  to  programming  with 
assertions  for  the  C  language  is  advocated,  and  it  is 
demonstrated  that  even  local  assertions  associated  with 
particular  points  within  the  program  may  be  extremely  useful 
for  program  debugging. 

The  FORMAN  language  for  computations  over  traces 
provides  a  flexible  means  for  writing  both  local  and  global 
assertions,  including  those  about  temporal  relations  between 
events. 

Algorithmic  Debugging 

The  original  algorithmic  program  debugging  method  was 
introduced  in  [30]  for  the  Prolog  language.  In  [29]  and  [15] 
this  paradigm  is  applied  to  a  subset  of  PASCAL.  The 
debugger  executes  the  program  and  builds  a  trace  execution 
tree  at  the  procedure  level  while  saving  some  useful  trace 
information  such  as  procedure  names  and  input/output 
parameter  values.  The  algorithmic  debugger  traverses  the 


execution  tree  and  interacts  with  the  user  by  asking  about  the 
intended  behavior  of  each  procedure.  The  user  has  the 
possibility  to  answer  “yes”  or  “no”  about  the  intended 
behavior  of  the  procedure.  The  search  finally  ends  and  a  bug 
is  localized  within  a  procedure  p  when  one  of  the  following 
holds:  procedure  p  contains  no  procedure  calls,  or  all 
procedure  calls  performed  from  the  body  of  procedure  p 
fulfill  the  user’s  expectations. 

Algorithmic  debugging  can  be  considered  as  an  example 
of  debugging  strategy,  based  on  some  assertion  language  (in 
this  case  assertions  about  results  of  a  procedure  call).  The 
notion  of  computation  over  execution  trace  introduced  in 
FORMAN  may  be  a  convenient  basis  for  describing  such 
debugging  strategies. 

7  CONCLUSIONS 

In  brief,  our  approach  can  be  explained  as  “computations 
over  a  target  program  event  trace  based  on  a  precise  program 
behavior  model”.  According  to  [7]  and  [26],  approximately 
40-50%  of  all  bugs  detected  during  the  program  testing  are 
logic,  structural,  and  functionality  bugs,  i.e.,  bugs  which 
could  be  detected  by  appropriate  assertion  checking  similar 
to  that  demonstrated  above. 

The  first  experiments  with  our  C  assertion  checker 
prototype  prove  that: 

instrumentation  of  the  C  source  code  may  be  an  appropri¬ 
ate  technique  for  automatic  testing  and  debugging  tool 
design, 

•  event  filtering  can  reduce  the  size  of  the  stored  event 
trace  to  5-20%  of  the  total  trace, 

•  the  size  of  the  stored  event  trace  could  be  kept  within 
reasonable  limits  (several  tens  of  thousands  of  events)  for 
realistic  C  programs. 
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1  Introduction 


e  suggest  an  approach  to  the  development  of  software  testing  and  debugging  automation  tools  based  on  precise 
program  behavior  models.  The  program  behavior  model  is  defined  as  a  set  of  events  (event  trace)  with  two  basic 
binary  relations  over  events  -  precedence  and  inclusion,  and  represents  the  temporal  relationship  between  actions  A 
language  for  the  computations  over  event  traces  is  developed  that  provides  a  basis  for  assertion  checking  debugging 
queries,  execution  profiles,  and  performance  measurements. 

The  approach  is  nondestructive,  since  assertion  texts  are  separated  from  the  target  program  source  code  and  can  be 
maintained  independently.  Assertions  can  capture  both  the  dynamic  properties  of  a  particular  target  program  and  can 
orma  lze  the  general  knowledge  of  typical  bugs  and  debugging  strategies.  An  event  grammar  provides  a  sound  basis 
for  assertion  language  implementation  via  target  program  automatic  instrumentation.  Event  grammars  may  be 
esigned  for  sequential  as  well  as  for  parallel  programs.  The  approach  suggested  can  be  adjusted  to  a  variety  of  pro¬ 
gramming  languages.  We  illustrate  these  ideas  on  examples  for  the  Occam  and  C  programming  languages. 

Dynamm  program  analysis  is  one  of  the  least  understood  activities  in  software  development.  A  major  problem  is 
still  the  inability  to  express  the  mismatch  between  the  expected  and  the  observed  behavior  of  the  program  on  the  level 
of  abstraction  maintained  by  the  user  [9],  In  other  words,  a  flexible  and  expressive  specification  formalism  is  needed 
to  describe  properties  of  the  software  system’s  implementation.  Program  testing  and  debugging  is  still  a  human  activ¬ 
ity  performed  largely  without  any  adequate  tools  and  consuming  more  than  50%  of  the  total  program  development 

time  and  effort  [8].  Debugging  concurrent  programs  is  even  more  difficult  because  of  parallel  activities  non-deter- 
mimsm  and  time-dependent  behavior. 

One  way  to  improve  the  situation  is  to  partially  automate  the  debugging  process.  Precise  model  of  program  behav¬ 
ior  becomes  the  first  step  towards  debugging  automation.  It  appears  that  traditional  methods  of  programming  lan¬ 
guage  semantics  definition  don’t  address  this  aspect.  In  building  such  a  model  several  considerations  were  taken  in 
account.  The  first  assumption  we  make  is  that  the  model  is  discrete,  i.e.  comprises  a  finite  number  of  well-separated 
e  ements.  This  assumption  is  typical  for  Computer  Science  methods  used  for  static  and  dynamic  analysis  of  programs. 
F°r  this  reason  the  notion  of  event  as  an  elementary  unit  of  action  is  an  appropriate  basis  for  building  the  whole 
model.  The  event  is  an  abstraction  for  any  detectable  action  performed  during  the  program  execution,  such  as  a  state¬ 
ment  execution,  expression  evaluation,  procedure  call,  sending  and  receiving  a  message,  etc. 

Acnons  (or  events)  are  evolving  in  time  and  the  program  behavior  represents  the  temporal  relationship  between 
ac  ions.  This  implies  the  necessity  to  introduce  an  ordering  relation  for  events.  Semantics  of  parallel  programming 
languages  and  even  some  sequential  languages  (such  as  C)  don’t  require  the  total  ordering  of  actions,  so  partial  event 
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ordering  is  the  most  adequate  method  for  this  purpose  [11], 


Actions  performed  during  the  program  execution  are  at  different  levels  of  granularity,  some  of  them  include  other 
actions,  e.g.  a  subroutine  call  event  contains  statement  execution  events.  This  consideration  brings  to  our  model  inclu¬ 
sion  relation.  Under  this  relationship  events  can  be  hierarchical  objects  and  it  becomes  possible  to  consider  program 
behavior  at  appropriate  levels  of  granularity. 

Finally,  the  program  execution  can  be  modeled  as  a  set  of  events  {event  trace )  with  two  basic  relations:  partial 
ordering  and  inclusion.  The  event  trace  actually  is  a  model  of  program’s  behavior  temporal  aspect.  In  order  to  specify 
meaningful  program  behavior  properties  we  have  to  enrich  events  with  some  attributes.  An  event  may  have  a  type  and 
some  other  attributes,  such  as  event  duration,  program  source  code  related  to  the  event,  program  state  associated  with 
the  event  (i.e.  program  variable  values  at  the  beginning  and  at  the  end  of  event),  etc. 

The  next  problem  to  be  addressed  after  the  program  behavior  model  is  set  up  is  the  formalism  specifying  properties 
of  the  program  behavior.  Since  our  goal  is  debugging  automation,  i.e.  a  kind  of  program  dynamic  analysis  that 
requires  different  types  of  assertion  checking,  debugging  queries,  program  execution  profiles,  and  so  on,  we  came  up 
with  the  concept  of  a  computation  over  the  event  trace.  It  seems  that  this  concept  is  general  enough  to  cover  all  the 
above  mentioned  needs  in  the  unifying  framework,  and  provides  sufficient  flexibility.  This  approach  implies  the 
design  of  a  special  programming  language  for  computations  over  the  event  traces.  We  suggest  a  particular  language 
called  FORMAN  [1],  [3],  [10]  based  on  functional  paradigm  and  the  use  of  event  patterns  and  aggregate  operations 
over  events. 

Patterns  describe  the  structure  of  events  with  context  conditions.  Program  paths  can  be  described  by  path  expres¬ 
sions  over  events.  All  this  makes  it  possible  to  write  assertions  not  only  about  variable  values  at  program  points  but 
also  about  data  and  control  flows  in  the  target  program.  Assertions  can  also  be  used  as  conditions  in  rules  which 
describe  debugging  actions.  For  example,  an  error  message  is  a  typical  action  for  a  debugger  or  consistency  checker. 
Thus,  it  is  also  possible  to  specify  debugging  strategies. 

The  notions  of  event  and  event  type  are  powerful  abstractions  which  make  it  possible  to  write  assertions  indepen¬ 
dent  of  any  target  program.  Such  generic  assertions  can  be  collected  in  standard  libraries  which  represent  the  general 
knowledge  about  typical  bugs  and  debugging  strategies  and  could  be  designed  and  distributed  as  special  software 
tools. 

FORMAN  is  a  general  language  to  describe  computations  over  program  event  trace  that  can  be  considered  as  an 
example  of  a  special  programming  paradigm.  Possible  application  areas  include  program  testing  and  debugging,  per¬ 
formance  measurement  and  modeling,  program  profiling,  program  animation,  program  maintenance  and  program 
documentation  [5].  A  study  of  FORMAN  application  for  parallel  programming  is  presented  in  [4] 

2  Events,  Event  Traces,  and  the  Language  for  Computations  Over  Event  Traces 

FORMAN  is  based  on  a  semantic  model  of  target  program  behavior  in  which  the  program  execution  is  represented 
by  a  set  of  events.  An  event  occurs  when  some  action  is  performed  during  the  program  execution  process.  For 
instance,  a  message  is  sent  or  received,  a  statement  is  executed,  or  some  expression  is  evaluated.  A  particular  action 
may  be  performed  many  times,  but  every  execution  of  an  action  is  denoted  by  a  unique  event. 

Every  event  defines  a  time  interval  which  has  a  beginning  and  an  end.  For  atomic  events,  the  beginning  and  end 
points  of  the  time  interval  will  be  the  same.  All  events  used  for  assertion  checking  and  other  computations  over  event 
traces  must  be  detectable  by  some  implementation  (e.g.  by  an  appropriate  target  program  instrumentation.)  Attributes 
attached  to  events  bring  additional  information  about  event  context,  such  as  current  variable  and  expression  values. 

The  model  of  target  program  behavior  is  formally  defined  through  a  set  of  general  axioms  about  two  basic  rela¬ 
tions,  which  may  or  may  not  hold  between  two  arbitrary  events:  they  may  be  sequentially  ordered  (PRECEDES),  or 
one  of  them  might  be  included  in  another  composite  event  (IN).  For  each  pair  of  events  in  the  event  trace  no  more 
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than  one  of  these  relations  can  be  established. 

^  There  are  several  general  axioms  that  should  be  satisfied  by  any  events  a,  b,  c  in  the  event  trace  of  any  target  pro. 

1)  Mutual  exclusion  of  relations. 


a  PRECEDES  b  =>  not  (a  IN  b)  and  not  (b  IN  a) 
a  IN  b  =>  not (a  PRECEDES  b)  and  not  (b  PRECEDES  a) 

2)  Noncommutativity. 

a  PRECEDES  b  =>  not (  b  PRECEDES  a) 
a  IN  b  =>  not (  b  IN  a) 

3)  Transitivity. 


(a  PRECEDES  b  )  and  (  b  PRECEDES  c  )  =>  (a  PRECEDES  c) 

(a  IN  b  )  and  (  b  IN  c  )  =>  (  a  IN  c) 

Irreflexivity  for  PRECEDES  and  IN  follows  from  2).  Note  that  PRECEDES  and  IN  are  irreflexive  partial  order- 
mgs. 

4)  Distributivity 


(a  IN  b)  and  (b  PRECEDES  c)  =>  (a  PRECEDES  c) 

(a  PRECEDES  b)  and  (c  IN  b)  =>  (a  PRECEDES  c) 

(FOR  ALL  a  IN  b  (FOR  ALL  c  IN  d  (a  PRECEDES  c)  ) )  =>  (b  PRECEDES  d) 


n  order  to  define  the  behavior  model  for  some  target  language,  types  of  events  are  introduced.  Each  event  belongs 
to  one  or  more  of  predefined  event  types,  which  are  induced  by  target  language  abstract  syntax  (e.g.  execute-state¬ 
ment,  send-message,  receive-message)  or  by  target  language  semantics  (rendezvous,  wait,  put-message-in-queue). 

The  target  program  execution  model  is  defined  by  an  event  grammar.  The  event  may  be  a  compound  object  and  the 
grammar  describes  how  the  event  is  split  into  other  event  sequences  or  sets.  For  example,  the  event  execute-assign¬ 
ment-statement  contains  a  sequence  of  events  evaluate-right-hand-part  and  execute-destination.  The  evaluate-right- 
hand-part,  in  turn,  consists  of  an  unique  event  evaluate-expression.  The  event  grammar  is  a  set  of  axioms  that  describe 
possible  patterns  of  basic  relations  between  events  of  different  type  in  the  program  execution  history,  it  is  not  intended 
to  be  used  for  parsing  actual  event  trace. 


The  rule  A  :  (  B  C )  establishes  that  if  an  event  a  of  the  type  A 
that  events  b  and  c  of  types  B  and  C,  also  exist,  such  that  the  relations 


occurs  in  the  trace  of  a  program,  it  is  necessary 
b  IN  a,  c  IN  a,  b  PRECEDES  c hold. 


For  example,  the  event  grammar  describing  the  semantics  of  a  PASCAL  subset 
The  names,  such  as  execute -program,  and  ex- stmt  denote  event  types. 


may  contain  the 


following  rules. 


execute-program  ::  ( ex-stmt  *  ) 


This  means  that  each  event  of  the  type  execute -program  contains  an  ordered  (w.r.t.  relation  PRECEDES) 


28 


sequence  of  zero  or  more  events  of  the  type  ex  -  stmt. 


ex-stmt ::  (  label?  ( ex-assignment  |  ex-read-stmt  |  ex-write-stmt  | 
ex-reset-stmt  |  ex-rewrite-stmt  |  ex-close-stmt  |  ex-cond-stmt  | 
ex-loop-stmt  |  call-procedure) ) 

The  event  of  the  type  ex-stmt  contains  one  of  the  events  ex- assignment ,  ex-read-stmt,  and  so  on. 
This  inner  event  determines  the  particular  type  of  statement  executed  and  may  be  preceded  by  an  optional  event  of  the 
type  label  (traversing  a  label  attached  to  the  statement). 

ex-assignment  ::  (ex-righthand-part  destination) 

The  order  of  event  occurrences  reflects  the  semantics  of  the  target  language.  When  performing  assignment  state¬ 
ment  first  the  right-hand  part  is  evaluated  and  after  this  the  destination  event  occurs  (which  denotes  the  assignment 
event  itself).  The  event  grammar  makes  FORMAN  suitable  for  automatic  source  code  instrumentation  to  detect  all 
necessary  events. 


An  event  has  attributes,  for  instance,  source  text  fragment  from  the  corresponding  target  program,  current  values  of 
target  program  variables  and  expressions  at  the  beginning  and  at  the  end  of  event,  duration  of  the  event,  previous  path 
(i.e.  set  of  events  preceding  the  event  in  the  target  program  execution  history),  etc. 

FORMAN  supplies  a  means  for  writing  assertions  about  events  and  event  sequences  and  sets.  These  include  quan¬ 
tifiers  and  other  aggregate  operations  over  events,  e.g.,  sequence,  bag  and  set  constructors,  boolean  operations  and 
operations  of  target  language  to  write  assertions  on  target  program  variables  [2]  [3],  Events  can  be  described  by  pat¬ 
terns  which  capture  the  structure  of  event  and  context  conditions.  Program  paths  can  be  described  by  regular  path 
expressions  over  events. 

The  main  extension  for  the  parallel  case  [4]  consists  of  the  introduction  of  a  new  kind  of  composite  event  —  ^snap¬ 
shot,”  which  can  be  considered  an  abstraction  for  the  notion  “a  set  of  events  that  may  happen  at  the  same  time.”  The 
snapshot  event  is  a  set  of  events  each  pair  of  which  is  not  under  the  relation  PRECEDES,  this  makes  it  possible  to 
describe  and  to  detect  at  run-time  such  typical  parallel  processing  faults  as  data  races  and  deadlock  states. 


3  Examples  of  Debugging  Rules  and  Queries 

In  general,  a  debugging  rule  performs  some  actions  that  may  include  computations  over  the  target  program  execu¬ 
tion  history.  The  aim  is  to  generate  informative  messages  and  to  provide  the  user  with  some  values  obtained  from  the 
trace  in  order  to  detect  and  localize  bugs.  Rules  can  provide  dialog  to  the  user  as  well.  An  assertion  is  a  boolean 
expression  that  may  contain  quantifiers  and  sequencing  constraints  over  events. 

Assertions  can  be  used  as  conditions  in  the  rules  describing  actions  that  can  be  performed  if  an  assertion  is  satisfied 
or  violated.  A  debugging  rule  has  the  form: 

assertion  SAY  (expression  sequence) 

ONFAEL  SAY  (expression  sequence) 

The  presence  of  metavariables  in  the  assertion  makes  it  possible  to  use  FORMAN  as  a  debugger  query  language. 
The  computation  of  an  assertion  is  interrupted  when  it  becomes  clear  that  the  final  value  will  be  False,  and  the  current 
values  of  metavariables  can  be  used  to  generate  readable  and  informative  messages. 

The  following  examples  have  been  executed  on  our  prototype  FORMAN/PASCAL  assertion  checker  [2],  [3],  The 
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PASCAL  program  reads  a  sequence  of  integers  from  file  XX.TXT. 


program  el  ; 

var  X:  integer ; 

XX:  file  of  text; 
begin 

X:=  7; 

(*  initial  value  is  assigned  here  *) 
reset  (XX,  'XX.TXT'); 
while  XoO  do 

read (XX,  X) 

end. 


The  contents  of  the  file  XX.TXT  are  as  follows: 

11  537893  13  23  45  8  754  45567  0 

Example  of  a  Query  1.  In  order  to  obtain  the  history  of  variable  X  the  following  computation  over  event  trace  can 
be  performed.  The  rule  condition  is  TRUE,  and  is  shown  as  a  side  effect  the  whole  history  of  variable  X. 

TRUE 

SAY  ( ’The  history  of  variable  X  is:’ 

[D:  destination  IS  X  FROM  execute_program  APPLY  VALUE(D)  ]  ) 

The  [  ...  ]  construct  above  defines  a  loop  over  the  whole  program  execution  trace  (execute_program 

event).  All  events  matching  the  pattern  destination  IS  X  are  selected  from  the  trace  and  the  function  VALUE  is 
applied  to  them.  The  resulting  sequence  consists  of  values  assigned  to  the  X  variable  during  the  program  execution. 

When  executed  on  our  prototype  the  following  output  is  produced: 

Assertion  #1  checked  successfully. 

The  history  of  variable  X  is:  7  ll  5  3  7  8  9  3  13  2  45  8  754  45567  0 

Example  of  an  Assertion  2.  Let’s  write  and  check  the  assertion  :  " The  value  of  variable  X  does  not  exceed  17.” 

FOREACH  *S :  ex_stmt  CONTAINS  (D:  destination  IS  X)  FROM  execute_program 
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VALUE (D)  <  17 


ONFAIL 

SAY ('Value  '  VALUE (D)  'is  assigned  to  the  variable  X  in  stmt  ') 

SAY (S) 

SAY ( ' This  is  record  #'  CARD [  ex_read_stmt  FROM  PREV_PATH ( S ) ]  +  1  'in  the 
file  XX. TXT') 

We  check  the  assertion  for  all  events  where  the  value  of  X  may  be  altered.  These  are  events  of  the  type  destina- 
tion  which  can  appear  within  ex_assignment_stmt  or  ex_read_stmt  events.  In  order  to  make  error  mes¬ 
sages  about  assertion  violations  more  informative  we  include  the  embracing  event  of  the  type  ex_stmt. 
Metavariables  S  and  D  refer  to  those  events  of  interest.  When  the  assertion  is  violated  for  the  first  time,  the  assertion 
evaluation  terminates  and  current  values  of  metavariables  can  be  used  for  message  output.  The  value  of  a  metavariable 
when  printed  by  the  SAY  clause  is  shown  in  the  form: 

event - type : >  event - source - text 

Time=  event -begin -time  . .  event-end-time 

Event  begin  and  end  times  in  this  prototype  implementation  are  simply  values  of  step  counter. 

Since  we  expect  the  assertion  might  be  violated  when  executing  a  Read  statement,  it  makes  sense  to  report  the 
record  number  of  the  input  file  xx .  txt  where  the  assertion  is  violated.  The  program  state  does  not  contain  any  vari¬ 
ables  which  values  could  provide  this  information.  But  we  can  perform  auxiliary  calculations  independently  from  the 
target  program  using  FORMAN  aggregate  operations.  In  this  particular  case  the  number  of  events  of  the  type 
ex_read_stmt  preceding  the  interruption  moment  is  counted.  This  number  plus  1  (since  the  violation  occurs  when 
the  read  statement  is  executed)  yields  the  number  of  an  input  record  on  which  the  variable  X  was  first  assigned  the 
value  exceeding  17. 

Assertion  #  2  violation! 

Value  45  is  assigned  to  the  variable  X  in  stmt 

ex_stmt  : >  Read (  XX  ,  X  )  Time=  73  ..  78 

This  is  record  #  11  in  the  file  XX. TXT 

Example  of  a  Query  3.  Profile  measurement.  In  order  to  obtain  the  actual  number  of  statements  executed,  the  fol¬ 
lowing  query  can  be  performed: 

TRUE 


SAY ( ' The  total  number  of  statements  executed  is:' 

CARD [  ALL  ex_stmt  FROM  execute  program  ] ) 

The  ALL  option  in  the  aggregate  operation  indicates  that  all  nested  events  of  the  type  ex_stmt  should  be  taken 
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into  account. 


Assertion  #3  checked  successfully. 

The  total  number  of  statements  executed  is:  18 


Example  of  a  generic  assertion  which  must  be  true  for  any  program  in  the  target  language. 

Each  variable  has  to  be  assigned  value  before  it  is  used  in  an  expression  evaluation.” 

FOREACH  *  S:  ex_stmt  FROM  execute_program 

FOREACH  *  E:  eval_expression  CONTAINS  (V:  variable)  FROM  S 

EXISTS  D:  destination  FROM  PREVJPATH(E)  SOURCE_TEXT(D)  =  SOURCE_TEXT(V) 

ONFAIL 

SAY(  ‘In  event’  S) 

SAY(  ‘in  expression  evaluation’) 

SAY  (E) 

SAY(‘uninitialized  variable’  SOURCE_TEXT(V)  ‘is  used’) 

For  the  following  PASCAL  program  our  prototype  detects  the  presence  of  the  bug  described  above, 
program  e2 ; 

var  X, Y :  integer; 
begin  Y:=  3; 

if  Y  <  2  then  begin 

X:=  7;  Y: =  Y  +  X 

else  Y:=  X  -  Y  (***  here  the  error  appears:  X  has  no  value!  ***) 

end. 

Assertion  #4  violation! 

In  event  ex_stmt  :  >  If  (  Y  <  2  )  then  X  1  ;  y  :=  (  Y  +  X  )  ; 

else  Y  : =  (  X  -  Y  )  ;  Time=  10  ..  35 

in  expression  evaluation 

eval_expression  : >  (  X  -  Y  )  Time=  20  . .  29 
uninitialised  variable  X  is  used 
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Debugging  rules  can  be  considered  as  a  way  of  formalizing  reasoning  about  the  target  program  execution  — 
humans  often  use  similar  patterns  for  reasoning  when  debugging  programs.  For  example,  if  the  index  expression  of  an 
array  element  is  out  of  the  range,  the  debugger  can  try  a  rule  for  eval-index  events  that  invokes  another  rule  about 
wrong  value  of  the  event  eval-expression,  which  in  turn  will  cause  investigation  of  histories  of  all  variables  included 
in  the  expression. 

Yet  another  application  of  generic  assertions  and  debugging  rules  may  be  for  describing  run-time  constraints 
(sequences  of  procedure  calls,  actual  parameter  dependences,  etc.)  for  nontrivial  subroutine  packages,  e.g.  for  the 
MOTIF  package  for  GUI  design.  A  library  containing  assertions  and  debugging  rules  relevant  to  such  a  package  may 
be  useful  for  writing  C  programs  calling  subroutines  from  the  package. 

4  Conclusions 

In  brief,  our  approach  can  be  explained  as  “computations  over  a  target  program  event  trace.”  We  expect  the  advan¬ 
tages  of  our  approach  to  be  the  following: 

•  The  notion  of  an  event  grammar  provides  a  general  basis  for  program  behavior  models.  In  contrast  with  previous 
approaches,  the  event  is  not  a  point  in  the  trace  but  an  interval  with  a  beginning  and  an  end. 

•  Event  grammar  provides  a  coordinate  system  to  refer  to  any  interesting  event  in  the  execution  history.  Program 
variable  values  are  attributes  of  an  event’s  beginning  and  end.  Event  attributes  provide  complete  access  to  each 
target  program’s  execution  state.  Assertions  about  particular  execution  states  as  well  as  assertions  about  sets  of 
different  execution  states  may  be  checked. 

•  The  PRECEDES  relation  yields  a  partial  order  on  the  set  of  events,  which  is  a  natural  model  for  parallel  program 
behavior. 

•  The  IN  relation  yields  a  hierarchy  of  events,  so  the  assertions  can  be  defined  at  an  appropriate  level  of  granularity. 

•  A  language  for  computations  over  event  traces  provides  a  uniform  framework  for  assertion  checking,  profiles, 
debugging  queries,  and  performance  measurements. 

•  The  access  to  the  complete  target  program  execution  history  and  the  ability  to  formalize  generic  assertions  can  be 
used  in  order  to  define  debugging  rules  and  strategies. 

•  The  fact  that  assertions  and  other  computations  over  target  program  event  trace  can  be  separated  from  the  text  of 
the  target  program  allows  accumulation  of  formalized  knowledge  about  particular  programs  and  about  the  whole 
target  language  in  separate  files.  This  makes  it  easy  to  control  the  amount  of  assertions  to  be  checked. 

According  to  [7]  and  [12]  approximately  40-50%  of  all  bugs  detected  during  the  program  testing  are  logic,  struc¬ 
tural,  and  functionality  bugs,  i.e.  bugs  which  could  be  detected  by  appropriate  assertion  checking  similar  to  the  dem¬ 
onstrated  above. 
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Abstract.  This  paper  suggests  an  approach  to  the  development  of  software  testing  and  debugging  automation  tools 
based  on  precise  program  behavior  models.  The  program  behavior  model  is  defined  as  a  set  of  events  (event  trace)  with 
two  basic  binary  relations  over  events  -  precedence  and  inclusion,  and  represents  the  temporal  relationship  between 
actions.  A  language  for  the  computations  over  event  traces  is  developed  that  provides  a  basis  for  assertion  checking 
debugging  queries,  execution  profiles,  and  performance  measurements. 

The  approach  is  nondestructive,  since  assertion  texts  are  separated  from  the  target  program  source  code  and  can  be 
maintained  independently.  Assertions  can  capture  both  the  dynamic  properties  of  a  particular  target  program  and  can  for¬ 
malize  the  general  knowledge  of  typical  bugs  and  debugging  strategies.  An  event  grammar  provides  a  sound  basis  for 
assertion  language  implementation  via  target  program  automatic  instrumentation.  Event  grammars  may  be  designed  for 

sequential  as  well  as  for  parallel  programs.  The  approach  suggested  can  be  adjusted  to  a  variety  of  programming  lan¬ 
guages.  * 

Keywords.  Program  behavior  models,  events,  event  grammars,  software  testing  and  debugging  automation 


1  Introduction 

Dynamic  program  analysis  is  one  of  the  least  understood  activities  in  software  development.  A  major  problem  is  still 
the  inability  to  express  the  mismatch  between  the  expected  and  the  observed  behavior  of  the  program  on  the  level  of 
abstraction  maintained  by  the  user  [11].  In  other  words,  a  flexible  and  expressive  specification  formalism  is  needed  to 
describe  properties  of  the  software  system’s  implementation.  Program  testing  and  debugging  is  still  a  human  activity  per¬ 
formed  largely  without  any  adequate  tools  and  consuming  more  than  50%  of  the  total  program  development  time  and 
effort  [10].  Debugging  concurrent  programs  is  even  more  difficult  because  of  parallel  activities,  non-determinism  and 
time-dependent  behavior. 

One  way  to  improve  the  situation  is  to  partially  automate  the  debugging  process.  Precise  model  of  program  behavior 
becomes  the  first  step  towards  debugging  automation.  It  appears  that  traditional  methods  of  programming  language 
semantics  definition  don’t  address  this  aspect. 

In  building  such  a  model  several  considerations  were  taken  in  account.  The  first  assumption  we  make  is  that  the  model 
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is  discrete,  i.e.  comprises  a  finite  number  of  well-separated  elements. 


This  assumption  is  typical  for  Computer  Science 


methods  used  for  static  and  dynamic  analysis  of 


programs.  For  this  reason  the  notion  of  event  as  an  elementary  unit  of 


action  is  an  appropriate  basis  for  building  the  whole  model.  The 


formed  during  the  program  execution,  such 


event  is  an  abstraction  for  any  detectable  action  per- 


as  a  statement  execution,  expression  evaluation,  procedure  call,  sending 


receiving  a  message,  etc. 


Actions  (or  events)  are  evolving  in  time  and  the  program  behavior  represents  the  temporal  relationship  betwi 
This  implies  the  necessity  to  introduce  an  ordering  relation  for  events.  Semantics  of  parallel  programming  languages  and 

even  some  sequential  languages  (such  as  C)  don’t  require  the  total  ordering  of  actions,  so  partial  event  ordi 
most  adequate  method  for  this  purpose  [18]. 


/een  actions. 


actions,  so  partial  event  ordering  is  the 


Actions  performed  during  the  program  execution  are  at  different  levels 


of  granularity,  some  of  them  include  other 


actions,  e.g.  a  subroutine  call  event  contains  statement 


execution  events.  This  consideration  brings  to  our  model 


inclusion 


relation.  Under  this  relationship  events  can  be  hierarchical  objects  and  it  becomes 
at  appropriate  levels  of  granularity. 


possible  to  consider  program  behavic 


Finally,  the  program  execation  can  be  modeled  an  a  se,  of  events  (even,  .race)  with  two  basic  relations:  partial  ordering 
and  inclusion.  The  event  trace  actually  is  a  model  of  program's  behavior  temporal  aspect.  In  order  to  specify  meaningful 
program  behavior  properties  we  have  to  enrich  events  with  some  attributes.  An  even,  may  have  a  type  and  some  other 
attributes,  such  as  even,  duration,  program  source  code  related  to  the  event,  program  stale  associated  with  (he  even,  (i.e. 
program  variable  values  at  the  beginning  and  at  the  end  of  event),  etc. 


The  next  problem  to  be  addressed  after  the  program  behavior  model  is  se,  up  is  the  formalism  specifying  properi.es  of 
the  program  behavior.  This  could  be  done  in  many  diffeien,  ways,  e.g.  by  adopting  some  kind  of  logic  calculi  (predicate 


logic,  temporal  logic).  Such  a  direction  leads  to  tools  for  i 


"  "  ““~uwu  1C“Ui  W  10015  IOr  P"*™  verification,  or  in  more  pragmatic  incarnations  to 

pproach  called  model  checking  [  1 3].  As  indicated  in  [  1  ]  "Dynamic  analysis  is  limited  to  checking  observed  behaviors. 

and  so  in  principle  provides  weaker  assurances,  but  this  is  balanced  by  cheeking  a  wider  range  of  properiies  and  typically 
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by  better  perfonnance  ... .’ 


Since  our  goal  is  debugging  automation,  i.e.  a  kind  of  program  dynamic  analysis  that  requires  different  types  of  asser¬ 
tion  checking,  debugging  queries,  program  execution  profiles,  and  so  on,  we  came  up  with  the  concept  of  a  computation 
over  the  event  trace.  It  seems  that  this  concept  is  general  enough  to  cover  all  the  above  mentioned  needs  in  the  unifying 
framework,  and  provides  sufficient  flexibility.  This  approach  implies  the  design  of  a  special  programming  language  for 
computations  over  the  event  traces.  We  suggest  a  particular  language  called  FORMAN  [2],  [4],  [16]  based  on  functional 
paradigm  and  the  use  of  event  patterns  and  aggregate  operations  over  events. 


Patterns  describe  the  structure  of  events  with  context  conditions.  Program  paths  can  be  described  by  path  expressions 
over  events.  All  this  makes  it  possible  to  write  assertions  not  only  about  variable  values  at  program  points  but  also  about 
data  and  control  flows  in  the  target  program.  Assertions  can  also  be  used  as  conditions  in  rules  which  describe  debugging 

actions.  For  example,  an  error  message  is  a  typical  action  for  a  debugger  or  consistency  checker.  Thus,  it  is  also  possible  to 
specify  debugging  strategies. 

The  notions  of  event  and  event  type  are  powerful  abstractions  which  make  it  possible  to  write  assertions  independent  of 
any  target  program.  Such  generic  assertions  can  be  collected  in  standard  libraries  which  represent  the  general  knowledge 
about  typical  bugs  and  debugging  strategies  and  could  be  designed  and  distributed  as  special  software  tools. 

FORMAN  is  a  general  language  to  describe  computations  over  program  event  trace  that  can  be  considered  as  an  exam¬ 
ple  of  a  special  programming  paradigm.  Possible  application  areas  include  program  testing  and  debugging,  performance 
measurement  and  modeling,  program  profiling,  program  animation,  program  maintenance  and  program  documentation 
[6],  A  study  of  FORMAN  application  for  parallel  programming  is  presented  in  [5] 

2  Events 

FORMAN  is  based  on  a  semantic  model  of  target  program  behavior  in  which  the  program  execution  is  represented  by 
a  set  of  events.  An  event  occurs  when  some  action  is  performed  during  the  program  execution  process.  For  instance,  a 
message  is  sent  or  received,  a  statement  is  executed,  or  some  expression  is  evaluated.  A  particular  action  may  be  per- 
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formed  many  times,  but  every  execution  of  an  action  is  denoted  by  a  unique  event. 

Ever,  even,  defines  .  time  intetva,  which  fins  .  beginning  and  an  end.  For  a,omic  even,,  ,fie  beginning  and  end  points 
of  ,he ,, me , mem!  will  be  ,h.  same.  All  evems  used  for  assertion  checking  and  o,her  computations  over  even,  .races  mus, 
be  detectable  by  some  ,mpleme„,a,io„  (e.g.  by  an  appropriate  tmge,  program  insttumemation.)  Attributes  attached 
events  bring  additional  .nformation  about  event  context,  such  as  current  variable  _ 


1  as  current  variable  and  expression  values. 


In  order  to  give  some  support  for  our  notion  of  even,  le,  us  consider  a  well-known  idea 
history  of  a  variable  X  when  used  as  a  counter  looks  like: 


such  as  a  counter.  Usually  the 


X  :=  0; ... 


Loop  ... 


X  :=  X 


endloop; ... 


check  whether  the  actual  behavior  of  the  counter  X  matches  the  pattern  described  by  the  program  fragment 
above  we  have  consider  the  following  even,,  Le,  ,„i,iali2e_X  denote  the  even,  of  assigning  0  to  the  variable  X. 
Augment  de„o,e  the  even,  of  tncr.menting  X,  and  Assig„_X  denote  an  even,  of  assign, ng  any  value  to  the  variable  X. 
One  could  check  whether  X  behaves  as  a  counter  when  a  program  segment  S  is  executed  in  the  following  way.  Firs,,  the 

senuence  A  of  a„  events  of  He  type  Assig„_X  from  the  even,  trace  of  program  segment  S  has  be  extracted  presetvrng 
the  ordering  between  events.  Second,  A  has  to  be  matched  with  the  pattern: 

Initialize^  (Augment  X)  * 


where'.'  denotes  repetition  aero  or  mom  time,  if  the  aetua,  sequence  of  events  does  no.  match  this  pattern  we  can 
report  a„  etrot.  Therefore,  assenion  checking  can  he  represented  as  a  kind  of  computation  over  target  program  even,  trace. 

Another  informal  example  involves  parallel  evems.  Le,  ns  suppose  that  Assrgn.Y  denotes  an  even,  of  assigning  a  vain, 
to  the  shared  variable  Y  through  any  of  smreral  parallel  processes.  Then,  detecting  a  se,  of  evems  of  the  type  Ass™  Y 
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that  happen  “at  the  same  time”  (i.e.  are  not  under  the  precedence  relation) 


may  be  evidence  of  a  possible  data-race  condi¬ 


tion  in  the  program  execution. 


The  program  srare  (current  values  of  variables,  can  be  constdered  a,  the  beginning  or  a,  rhe  end  of  an  approprraie  even.. 

Tins  provides  the  opportuntty  ,o  write  assertions  about  prog™  variable  values  a.  diffemn,  points  in  the  prog™  exert,, 
tion  history. 


Prog™  profiling  usually  is  based  on  counting  the  number  of  events  of  some  type,  e.g.  the  number  of  statement  exec,,, 
dons  or  procedure  calls.  Perfotmance  measurements  may  be  based  on  attaching  the  duration  attribute  to  such  events  and 


summarizing  durations  of  selected  events. 


3  Event  Trace  and  the  Language  for  Computations  Over  Event  Traces 

FORMAN  is  a  high-level  specification  language  for  expressing  intend  „ ,. _ 


expressing  intended  behavior  or  known  types  of  error  conditions 


when  debugging  or  testing  programs.  It  is  intended  to  be  used  i 


which  is  called  the  target  language. 


in  conjunction  with  a  high-level  programming  language 


The  model  of  target  program  behavior  is  formally  defined 


through  a  set  of  general  axioms  about  two  basic  relations. 


htch  may  or  may  not  hold  between  two  arbitrary  events:  they  may  be  sequentially  ordered  (PRECEDES),  or  one  of  them 

might  be  included  in  another  composite  even.  (IN),  For  each  pair  of  events  in  the  even,  trace  no  more  than  one  of  these 
relations  can  be  established. 

There  are  several  genera!  axioms  .ha,  should  be  sat.sfied  by  any  events  a,  b.  c  in  the  even,  trace  of  any  targe,  program. 

1 )  Mutual  exclusion  of  relations. 


a  PRECEDES  b  =>  not  (a  IN  b)  and  not  (b  IN  a) 
a  IN  b  =>  not (a  PRECEDES  b)  and  not  (b  PRECEDES  a) 


2)  Noncommutativity. 


a  PRECEDES  b  =>  not (  b  PRECEDES  a) 
a  IN  b  =>  not (  b  IN  a) 
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3)  Transitivity. 


(a  PRECEDES  b  )  and  (  b  PRECEDES  r  )  i  r ™ 

B,uj4b  c  '  =>  (a  PRECEDES  c) 

(a  IN  b  )  and  (  b  IN  c  )  =>  (  a  IN  c) 

Irreflexivity  for  PRECEDES  and  n.  follows  fr„m  2).  Note  that  PRECEDE$  and  K 
4)  Distributivity 

(a  IN  b)  and  (b  PRECEDES  c)  =>  (a  PRECEDES  c) 

(a  PRECEDES  b)  and  (c  IN  b)  =>  (a  PRECEDES  c) 

(FOR  ALL  a  IN  b  (FOR  an  ~  Tvr  j  , 

b  (FOR  ALL  c  IN  d  (a  PRECEDES  O  ) )  =>  (b  PRECEDES  , 


are  irreflexive  partial  orderings. 


m  order  define  .he  behavior  model  for  dome  targe.  latrguage,  types  of  events  are  introduced.  Each  even,  belon-s  to 
more  of  predefined  event  types,  which  are  induced  by  target  language  abstract  syntax  (e.g.  execute-statement,  send- 
message,  receive-m essage,  or  by  targe,  language  semantics  (rendezvous,  wait.  put. message-, n-q„eue, 


The  target  program  execution  model  is  defined  by  an  event  grammar.  The 


grammar  describes  how  the  event  is  split  into  other 


event  may  be  a  compound  object  and  the 


event  sequences  or  sets.  For  example,  the  event 


execute-assignment- 


statement  contains  a  sequence  of  events  evaluare-nght-haud-par,  and  execute-destmation.  The  evaluate.,, gh, -hand-pan,  in 


turn,  consists  of  an  unique  event  evaluate-expression.  The 


of  basic  relations  between  events  of  different  type  in  the 


event  grammar  is  a  set  of  axioms  that  describe  possible  patterns 


ing  actual  event  trace. 


program  execution  history,  it  is  not  intended  to  be  used  for  1 


The  rule  A  (  B  c)  establishes  that  if  an  event  a  of  the  type  A  occurs  in  the 
events  b  and  c  of  types  B  and  C,  also  exist,  such  that  the  relations  b  IN  a  c  n 


trace  of  a  program,  it  is  necessary  that 


IN  a,  c  IN  a,  b  PRECEDES  c  hold. 


For  example,  the  event  grammar  describing  the 


semantics  of  a  PASCAL  subset  may  contain  the  following  rules.  The 
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names,  such  as  execute -program,  and  ex- stmt  denote 


execute -program  : :  (  ex-stmt  *  ) 


event  types. 


This  means  that  each  event  of  the  type  execute - 
sequence  of  zero  or  more  events  of  the  type  ex-stmt. 


program  contains  an  ordered  (w.r.t.  relation  PRECEDES) 


ex-stmt  ::  (  label?  (  ex-assi 


gnment  |  ex -read -stmt 


ex-write-stmt 


ex*reset-stmt  |  ex-rewrite- 


stmt  |  ex-close-stmt  |  ex-cond-stmt 


ex-loop-stmt  |  call-procedure) 


The  event  of  the  type  ex- stmt  contains  one  of  the 
inner  event  determines  the  particular  type  of  statement 
label  (traversing  a  label  attached  to  the  statement). 


events  ex-assignment,  ex-read-stmt,  and  so  on.  This 
executed  and  may  be  preceded  by  an  optional  event  of  the  type 


ex-assignment  ::  (ex-righthand-part  destination) 

of  event  occurrences  reflects  the  semantics  of  the  target  language.  When  performing  assignment  statement 
first  the  right-hand  part  is  evaluated  and  after  this  the  destination  event  occurs  (which  denotes  the  assignment  event  itself) 
The  even,  grammar  mahes  FORMAN  suitable  for  automata  source  code  instrumentation  to  detect  a„  necessary  even,. 

An  even,  has  attribute,  for  instance,  source  ,ex,  frag.cn,  fro.  ,h.  corresponding  ,arge,  prog™,,  current  values  of  ,a, 

get  program  variables  and  expressions  at  the  beginning  and  at  the  end  of  event,  duration  of  the  event,  previous  pad,  (i.c.  se, 
of  events  preceding  the  even,  in  the  targe,  program  execution  history),  etc. 

and  other  aggregate  open, ions  over  events,  e.g,  seguence,  bag  and  set  constructors,  boolean  operations  and  operations  of 
target  language  to  write  assertions  on  target  program  variables  [3]  [4], 

Events  can  be  described  by  patterns  which  capture  the  s, mature  of  even,  and  context  condition.  Program  pa*s  can  be 
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described  by  regular  path  expressions 


over  events. 


mai"  eX,e"S'°n  f0r  PI  —  of  the  ,„troduction  of  ,  new  kind  of  co  . 

Wh,ch  can  be  considered  as  an  abstraction  for  the  „„,i„  .  ,  opposite  even, .. -snapshot,-. 

— o..sa,„f_c,^rofwh::- 


to  detect  at  run-time  such 


not  under  the  relation  PRECEDES  and  makes  i 


typical  parallel 


processing  faults  as  data  races 


it  possible  to  describe  and 


and  deadlock : 


All  this  makes  it  possible  to  formalize 


assertions  of  the  following  types: 


•  "file  must  be  opened,  then  the  read  state,™  is  perfora,ed 


executed,” 


zero  or  more  times  and  after  that  the  close 


statement  is 


at  least  one  variable  changes 


its  value  during  one  loop  L  i 


iteration,” 


“after  the  execution  of  a  subprogram  P 


the  value  of  variable  X 


remains  unchanged 


*  “there  is  an  attempt  to  assign  values  to  the 


same  variable  in  two  parallel 


processes”  (data  race  condition). 


*  “deadlock  for  parallel  processes  PI  and  P2  is  detected.” 


In  addition  to  debugging  and  testing,  FORMAN 


can  also  be  used  to  specify  profi,es  and  performance 


measurements. 


4  Examples  of  Debugging  Rules  and  Queries 

*  —I  a  ***,  r*  performs  some  Ktjons  rhM  ^  incMe 

Th'  >im  ‘S  “  «—  -sages  and  „  prov,de  *.  „  ^  ,  ,  *"  ^  ™ 

order  to  detect  and  for  r  u  °me  values  obtained  from  the  trace  in 

and  localize  bugs.  R,„es  can  provide  dialog  „  ^  ^  ^ 

-ay  domain  quantifiers  and  sequencing  cons, minis  over  events.  ‘  'XPrtSSi°“  'ha' 


Assenions  ean  be  used  as  conditions  in  the  rules  describe 


g  actions  that  can  be  performed  if  an  assertion  is  satisfied 
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violated.  A  debugging  rule  has  the  form: 


assertion  SAY  (expres 


sion  sequence) 


ONPAXL  SAY  (expression 


sequence) 


The  presence  of  metavariables  in  the 


computation  of  an  assertion  is  interntpted 


assertion  makes  it  possible  to  use  FORMAN 


as  a  debugger  query  language.  The 


of  metavariables  can  be  used 


when  i,  becomes  dear  that  the  final  ra|„e  wilI  be  Fa,se_  and  ^ 


to  generate  readable  and  infonnati 


current  values 


ive  messages. 


The  following  examples  have  been 

CAL  our  prototype  FORMAN/PASCAL  assertion  checker  [3]  [4,  ThePAS 

CAL  program  reads  a  sequence  of  image*  from  fife  XX.TXT. 


program  el; 


X:  integer; 


XX:  file  of  t€ 


begin 


X:=  7; 


(*  initial  value  is 
reset  (XX,  'XX. TXT') 
while  x<>0  do 


assigned  here  *) 


read (XX,  x) 


The  contents  of  the  file  XX.TXT  are  as  follows: 


11  5  3  7  8  9  3 


13  2  3  45  8  754 


45567  0 


Examp“  °f°  “  **  the  hisrory  of  variabie  X  .he 


follovvine 


computation  over  event  trace  can  be 
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performed.  The  rule  condition  is  TRUE 


and  is  shown  as  a  side  effect  the 


whole  history  of  variable  i 


SAY  (  ’The  hist 


°ry  of  variable  x  i£ 


[D:  destinat 


ion  is  x  from  execut 


program  APPLY  VALUE 


The  [  ... 


construct  above  defines  a  loop  over  the 


(D)  ]  ) 


A'l  events  matching  the  pattern  destination  IS 


’  Pr°Srm  <*«=Ute_program  eve 


them.  The  resulting  sequence 


X  are  selected  from  the 


consists  of  values  assigned  to  the  X 


trace  and  the  function  VALUE  is 


When  executed  on  our 


variable  during  the  program  execution. 


applied  to 


prototype  the  following 


output  is  produced: 


Assertion  #1  checked 


successfully. 


Trie  history  of 


variable  X  iS!  7  n  5  3  7  8 


9  3  13  2 


45  8  754 


45567  0 


Example  of 


an  Assertion  2.  Let 


s  write  and  check  the 


FOREACH  *S: 


ONFAIL 


assertion  :  The  mlae  o/varjal>le  x 


ex_stmt  CONTAINS  (D:  destination 


does  not  exceed 


IS  X)  FROM 


execute_program  VALUE(D)  <  j  7 


SAY(‘ Value  ‘  VALUE(D) 


SAY(Sj 


is  assigned  to  the  variable  X 


m  stmt  ‘) 


SAY(‘This  is  record  CAP  nr 

RD[ex_read_stmtFROMPREV  PATH'™  -  '  - 

-  ATH(S)]  +  ]  In  the  fi]e  XX.TXT’) 

We  check  the  assertion  for  all  events  where  the  value  of  X  n 

or  X  may  be  altered.  These 


can  appear  within  ex 


assertion  violations  more  infoimati 


;_assignment_stmt  or ex_read_stmt 


are  events  of  the  type  destinati< 


ve  we  include  the 


events.  In  order  to  make 


embracing  event  of  the  type 


error  messages  about 


e*_stint.  Metavariables  s  and D  refer 
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values  of  metavariables  can  be  used  for 


shown  in  the  form: 


message  output.  The  value  of  a  metavariable  when  printed  by  the  SAY  clause  is 


event- type :>  event -source -text 


Time=  event-begin-time  ..  event -end- time 


Event  begin  and  end  times  in  this  prototype  implementation 


are  simply  values  of  step  counter. 


Since  we  expect  the  assertion  might  be  violated  when 
number  of  the  input  file  xx .  txt  where  the  assertion  is  v 


executing  a  Read  statement,  it  makes  sense  to  report  the  record 


-  «c  wnere  the  assertion  is  violated.  The  program  state  does  not  contain  any  variables  which 
values  co„,d  provide  this  mfotmation.  B,„  we  can  perfotm  auKiliat.  caicuiations  independentIy  from  ^  ^ 

“SinS  F0RMAN  ~  °Pera,i”S- ,hiS  “  -  -  of  even,  of  the  type  «  «ad_stnt  prec5d. 

,'ng  .he  interruption  moment  is  counted.  This  number  plus  I  (since  the  violation  occttta  when  the  read  statement  is  - 


cuted)  yields  the  number  of  an  input  record 


on  which  the  variable  X  was  first  assigned  the  value  exceeding 


Assertion  #  2  violation! 


Value  45  is  assigned  to  the  variabl 


e  X  in  stmt 


ex_stmt  : >  Read (  XX  ,  X  )  Time=  73  . .  78 


This  is  record  #  n  in  the  fil 


e  XX. TXT 


Example  of  a  Query  3.  Profile  measurement.  In  order  to  obtain 


the  actual  number  of  statements  executed,  the 


fol  lowing: 
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query  can  be  performed: 


TRUE 


SAY ( ' The  total  number  of  statement 


s  executed  is : 


CARD [  ALL  ex_stmt  FROM  execute_program  ] 


The  ALL  option  in  the  aggregate  operation  indicates  that  all  nested 


events  of  the  type  ex_stmt  should  be  taken  into 
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account. 


Assertion  #3  checked  successfully. 

The  total  number  of  statements  executed  is :  if 


Example  of  a  generic  assertion  which  must  be  true  for 


any  program  in  the  target  language. 


“Each  variable  has  to  be  assigned  value  before 


it  is  used  in  an  expression  evaluation.” 


FOREACH  *  S:  ex_stmt  FROM  execute_program 

FOREACH  *  E:  eval_expression  CONTAINS  (V:  variable)  FROM  S 

EXISTS  D:  destination  FROM  PREV_PATH(E)  SOURCE_TEXT(D)  = 


SOURCE_TEXT(V) 


ONFAIL 


SAY(  ‘In  event’  S) 


SAY(  ‘in  expression  evaluation') 


SAY(E) 


SAY(‘uninitialized  variable’  SOURCE_TEXT(V)  ‘is  used’ 


For  the  following  PASCAL  program  our 


prototype  detects  the  presence  of  the  bug  described  above. 


program  e2 ; 


var  X,Y:  integer; 


begin  Y:=  3; 


if  Y  <  2  then  begij 


AI2000.doc 


X:=  7;  Y : =  Y  +  X 


else  Y :  =  X  -  Y  (***  hpro  = 

the  error  appears:  X  has  no  value! 


end . 


Assertion  #4  violation! 

In  event  :»  If  (  V  <  2  )  then  X  7  ;  V  ,.  (  V  t  X  )  ; 

else  Y  : =  (  X  -  Y  )  ;  Time=  10  . .  35 


in  expression  evaluation 
eval_expression  :>  (  X  -  Y  )  Time=  20  . .  29 


uninitialised  variable  X  is  used 


Debu88'ns  """  ”  "  as  a — — -  —  program  ..  homans 

■h=  even,  evasion,  which  in  turn  wi„  cause  investigation  „  ^  ^  ^  ^ 

Yet  another  application  of  generic  assertions  and  debugging  rules  mav  h,  fx  h 

bs  to  may  be  for  describing  run-time  constraints 

(sequences  »f  Procedure  caHs,  -  „  dependences,  e,o>  for  non,™,  suhroudne  packages.  c.g.  for  the  MOTff 

package  for  0(1,  design.  A  iihra*  co„ta,„,„8  assen.ons  and  debugging  ruies  reievan,  ,o  such  a  package  may  he  usefe,  for 
writing  C  programs  calling  subroutines  from  the  package. 

5  Related  Work 

ollows  is  a  very  brief  survey  of  basic  ideas  known  in  Debugging  Automation  to  provide  the  background  for  the 
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approach  advocated  in  this  paper. 


5.1  Event  Notion 

The  Even,  Based  Behavior,,  Abstraction  (EBBA)  nre.hod  sheared  in  [8,  charactenges  ,he  behavior  of  ,he  whole  pro- 
gran,  ,n  ,enns  of  hoih  primitive  and  composite  evenrs.  Context  conditions  involvmg  even,  attribute  values  can  he  nsed  to 
distinguish  even,  EBBA  defines  two  hrgher  ,eve,  means  for  modeiing  system  behavior  -  cinstering  and  fi.termg,  Clus- 
express  behavior  as  composite  events,  i.e.  aggregates  of  previously  defined  events.  Filtering  serves  to 


eliminate  from  consideration 


events  which  are  not  relevant  to  the  model  beins 


tering  can  be  performed  at  run-time. 


investigated.  Both  event  recognition  and  fil- 


An  event-based  debugger  for  the  C  programming  language  called  Dalek  [23]  provides  a  means  for  description  of  user- 
defined  events  which  typically  are  points  within  a  program  execution  trace.  A  targe,  program  has  to  he  instrumented  in 
order  collect  values  of  event  adr.hu, es.  Composite  events  can  he  recognraed  a,  run-time  as  collections  of  primitive 


FORMAN  has  a  more  comprehens.ve  modelling  approach  than  EBBA  or  Dalelt.  based  on  the  even,  grammar.  A  lan¬ 
guage  fo,  expressing  computations  over  execution  histories  is  provided,  which  is  missing  in  EBBA  and  Da, eh  The  even, 
grammar  mages  FORMAN  suitable  for  automatic  source  code  instrumentation  to  detect  a„  necessary  even,  FORMAN 
supports  design  of  universal  assertions  and  debugging  ru,es  that  could  he  used  for  debugging  of  arbrtratg  targe,  proems 
Th.s  genera, *  is  missing  in  EBBA  and  Da, eg  approaches.  The  even,  in  FORMAN  is  a  time  intervai,  contras,  with  the 
even,  no,, on  ,„  previous  approaches  where  events  are  considered  pom, wise  time  moments. 

COCA  debugger  [14)  for  the  C  language  uses  the  GDB  debugger  for  tracing  and  PROLOG  for  debugging  queries 
execution.  R  provides  a  certam  even,  grammar  for  C  daces  and  even,  patterns  based  on  attributes  for  evemsealh  The 
guerg  language  is  des.gned  around  special  primitives  built  into  the  PROLOG  query  evaiuator.  We  assume  tha,  FORMAN 
,s  more  suitable  for  face  computations  as  i,  has  been  designed  fo,  this  specific  purpose. 


5.2  Path  Expressions 

Data  and  control  flow  descriptions  of  the  target 


program  are  essential  for  testing  and  debugging  purposes.  It  is  useful 
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give  such  a  description  in  an  explicit  and  precise  form  The  nath 

programs  in  fm  *  k  h  “PreSS'°"  for  specify, „s  para|lel 

S°”eS“  <’m,al,sm  Tra“opooificatio„s  else  in[22Jf<)rso|iware 

pr.ee, 

oggested  as  kmds  of  debugger  commapda  FORMAN  provides  flexibie  iapguage  means  for 


trace  specification  including 


event  patterns  and  regular  expressions  over  them. 

^•3  Assertion  Languages 

in  tne  1 5>L  [19],  [25]  annotation  language  for  Ada  the  nn,'  e 

TasbPaItt  „  ' °f' :™"'-”»‘iooed  in  order  ,o  describe,  he  behav,  prof 

number  of  special  pre-defined  predicates.  Assertion-checking  is  dynamic  at  run-time,  and  does  not  need  post-mor¬ 
tem  analysis.  The  RAPIDE  project  [2 1]  provides  a  reach  event-based  assertion  language  for  software  architecture  descrip- 


o  l  1  evears  are  mrrodoced  ,o  describe  process  commupicabop,  .ermipario,  and  coppecop  apd  deepen,  of  pro- 

,0"  °f  d0M  “  "'"-'“"O-  No  corpposite  evepts  are  provided- 

Another  recent  experimental  debug^n*  tool  is  hased 

Mc  .  d  ,n,“  aM,ySiS  »i,h  *o  »os=n,OPS  i„  temporal  ipterval 

°  ThlS  WOrk  ,S  Presented  in  [  1 7]  where  four  types  of  events  are  intrnd  h 

P  S  ar£  imr°duCed:  ^signment  to  variables,  reaching  a  label 

interprocess  communication  and  process  instantiit.nn  , 

^inat10"- Composite  events  cannot  be  defined.  Different  vari¬ 
eties  of  temporal  logic  languages  are  used  fhr 

for  program  static  analysis  called  Model  Checking  [13]. 

In  [26]  a  practical  approach  to  programming  with  assertions  for  the  r  i 

anSllaSe  IS  advocated,  and  it  is  demonstrated  that 

even  local  assertions  associated  with  particular  points  within  rh 

P  within  the  program  may  be  extremely  useful  for  program  debug- 
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ging. 


The  FORMAN  language  for  computations  over  traces  provides  flexible  means  fo,  writing  both  local 
tions,  including  those  about  temporal  relations  between  events. 


and  global  asser- 


5.4  Algorithmic  Debugging 

The  original  algorithmic  program  debugging  method  was  introduced 
this  paradigm  is  applied  to  a  subset  of  PASCAL. 


in  [28J  for  the  Prolog  language.  In  [27]  and  [15] 


saving  some  useful 


The  debugger  exeeutes  the  prognrm  and  builds  a  trace  execution  tree  a,  the  procedure  level  while  savrng  some  useful 
trace  information  such  as  procedure  names  and  inpuboutpu,  parameter  values.  The  algorithmic  debugger  traverses  the 
execution  tree  and  crams  with  the  user  by  asking  about  the  intended  behavior  of  each  procedure.  The  user  has  the  possi- 
b'lity t0  anSwer  “yes”  or  “no”  about  *e  intended  behavior  of  the  procedure.  The  .  i ,  ..  . 


-  °  ,mcnaea  Denavior  of  each  procedure.  The  user  has  the  possi- 

bility  to  answer  -yes"  or  "no”  about  the  intended  behavior  of  the  procedure.  The  search  Anally  ends  and  a  bug  is  localized 

wtthin  a  procedure,  when  one  of  the  following  holds:  procedure,  contains  no  procedure  calls,  or  al,  procedure  calls  per- 
formed  from  the  body  of  procedure/?  fulfill  the  user's  expectations. 


Algorithmic  debugging  can  be  considered  as  an  example  of  debugging  strategy,  based  on  some  assertion  language  (in 
this  case  assertions  about  results  of  a  procedure  call.)  The  notion  of  computation  over  execution  trace  reduced  in  FOR. 
MAN  may  be  a  convenient  basis  for  describing  such  debugging  strategies. 


6  Conclusions 

In  brief,  our  approach  can  be  explained  as  "computations 
of  our  approach  to  be  the  following 


over  a  target  program  event  trace.”  We  expect  the  advantages 


•  The  notion  of  an  even,  grammar  provides  a  general  basis  for  program  behavior  models.  In  contras,  with  previous 
approaches,  the  event  is  not  a  point  in  the  trace  but  an  interval  with  a  beginning  and  an  end. 

•  Even,  grammar  provtdes  a  coordinate  system  to  refer  to  any  interesting  even,  in  the  execution  history.  Program  variable 
values  are  attributes  of  an  event's  beginning  and  end.  Even,  attributes  provide  complete  access  ,0  each  targe, 
program's  execution  state.  Assertions  about  particular  execution  states  as  well  as  assertions  about  sets  of  different 
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execution  states  may  be  checked. 


•  The  PRECEDES  relation  yields  a  partial  order 


behavior. 


on  the  set  of  events,  which  is  a  natural  model 


for  parallel  program 


The  IN  relation  yields  a  hierarchy  of  events  !ntu 

3  oi  events,  so  the  assertions  can  be  defined 


at  an  appropriate  level  of  granularity. 


A  language  for  computations 


debugging 


queries,  and  performance  measurements. 


over  event  traces  provides  a  uniform  framework  for  assertion  checking,  profiles. 


The  access  the  complete  target  program  execution  history  and  the  ability  f„mal 
in  order  to  define  debugging  rules  and  strateoies. 


ize  generic  assertions  can  be  used 


•  The  fact  that  assertions  and  other  computations  over  target  nm0r, 

g  p  c  am  event  trace  can  be  separated  from  the  text  of  the 

target  program  allows  accumulation  of  formalized  knnwW  u, 

know  ledge  about  particular  programs  and  about  the  whole  target 

separate  files.  This  makes  it  easy  to  control  the  amount  of  assertions  to  be  checked. 


According  to  [9]  and  [24] 


approximately  40-50%  of  all  bugs  detected  during  the 


and  functionality  bugs,  i.e.  bu 


gs  which  could  be  detected  by 


program  testing  are  logic,  structural, 


above. 


appropriate  assertion  checking  similar  to  the  demonstrated 


U  appears  ,ha,  rhe  approach  im,ially  design  tbr  program  behavior  modeiihg  may  he  used  i„  orher  dyuamic  system 
ehavior  modeis  as  we.  The  merhodoiogy  is  based  on  identic  eveu,  iypes  represent  esseuiia,  actions  petfood 

wit  in  the  system,  and  defining  the  basic  relations  PRECEDES  and  TV  fnr  th 

KtLtDES  and  IN  for  those  events  (event  grammar),  and  appropriate 

event  attributes.  Then  the  FORMAN-like 

man  hbe  language  f„,  compuianons  over  even,  .races  may  be  developed  specify  behav- 
iorpropert.es,  to  perform  queries  and  other  kinds  of  dynamic  analysis. 

This  work  was  supported  in  part  by  NSF  grant  #9S  10732. 
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Static  Analysis  for  Program  Generation  Templates1 


Valdis  Berzins 
Naval  Postgraduate  School 
Monterey  CA  93943  USA 


Apstract 


This  paper  presents  an  approach  to  achieving  reliable  cost-effective  software  via  automatic  program 
generation  patterns.  The  main  idea  is  to  certify  the  patterns  once,  to  establish  a  reliability  property  for  all 
of  the  programs  that  could  possibly  be  generated  from  the  patterns.  We  focus  here  on  propertie?that  can 
be  checked  via  computable  static  analysis.  Examples  of  methods  to  assure  syntactic  correctness  and 
exception  closure  of  the  generated  code  are  presented.  Exception  closure  means  that  a  software  module 
cannot  raise  any  exceptions  other  than  those  declared  in  its  interface. 


1.  Introduction 

Our  goal  is  to  provide  cost  effective  means  for  creating  reliable  software.  We  are  addressing  the 

1S,UeK.^  improving  the  technology  for  automatic  software  generation,  with  particular  attention  to 
reliability  issues. 

We  take  a  domain  specific  view  of  this  process:  a  domain  is  a  family  of  related  problems  addressing 
a  c°mm°n  set  of  issues.  A  domain  analysis  identifies  the  problem  and  issues,  formulates  a  model  of  these 
and  determines  a  corresponding  set  of  solution  methods.  Users  of  the  proposed  computer-aided  software 
generation  system  describe  their  particular  problem  using  a  domain  specific  problem  modeling  language 
that  provides  concrete  representations  of  problems  in  the  domain.  The  system  then  automatically 
deSSIT  ",  S°  “Tv  me,thods  are  nPPKcable,  customizes  them  to  the  specific  problem  instance 
specified  problfm  ^  langUage’  and  then  automatically  generates  a  program  that  will  solve  the 

seek  to  provide  tool  support  for  the  above  process  that  can  be  applied  to  many  different  problem 
domains,  and  that  can  generate  code  m  any  programming  language.  Therefore  we  seek  uniform  and 
effective  methods  for  generating  software  generators  of  the  type  described  above,  given  definitions  of  the 

SmEEEf  Ianguage’  th®  tarSet  Programming  language,  and  the  roles  for  synthesizing  solution 
programs.  A  simple  architecture  for  this  process  is  shown  in  Figure  1 

The  specific  goals  of  this  paper  are:  (1)  to  provide  a  simple  example  of  a  language  for  expressing 

statkMrul  eT tterfh '  ^  SPewC  C*!°Ugh  t0  be  USed  as  synthesis  ra,es  and  (2)  to  provide  examples  of 

generated  fro"  ^  ^  the  Problems  of  certifying  that  all  programs  which  can  be 

E  IS?  T  I  Sel°!  eS:.(I)  are  syntactical‘y  correct  and  (2)  will  not  raise  any  exceptions  other 
than  those  explicitly  specified  in  an  interface  description. 

This  is  a  step  towards  a  coordinated  system  of  static  and  dynamic  checks,  to  be  performed  on 
program  synthesis  rules.  Our  hypothesis  is  that  the  most  cost  effective  way  to  improve  software  quality  is 
o  systematically  improve  and  certify  the  rules  used  to  generate  a  domain-specific  software  generator 
This  approach  directly  addresses  the  issue  of  correctly  implementing  given  software  requirements.  It  also 
indirectly  addresses  the  issue  of  getting  the  nght  requirements,  because  it  should  eventually  enable  rapid 
pro  otyping  of  product  quality  systems  by  problem  domain  experts,  who  need  not  be  software  experts  If 
the  requirements  are  found  to  be  inappropriate,  the  domain  experts  will  simply  update  the  problem  models 
and  regenerate  a  new  version  of  the  solution  software. 

-3®  Wl11  re^f  !°,th?  S0flWare  generation  Patterns  as  templates.  Our  rationale  for  the  claim  of  cost 
effectiveness  is  that  the  benefits  of  quality  improvements  to  the  templates  can  be  extended  to  all  past  and 
future  applications  of  the  generators  -  by  regenerating  the  generator  using  the  improved  templates  and 
tiien  regenerating  the  past  applications.  The  regeneration  process  can  be  completely  automated,  thereby 
e  ucmg  labor  costs,  eliminating  a  source  of  random  human  errors,  and  speeding  up  the  process  of 
repairing  a  known  fault  throughout  a  large  family  of  software  systems.  P  process  ot 


IWImTIZZ.kZT*  blthe  U'  S‘  ReS6arch  0ffice  contract/grant  number 
35037-MA  and  40473-MA,  and  in  part  by  DARPA  under  contract  #99-F759. 
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The  relation  to  the  theme  of  this  workshop  is  that  fast  moving  scenarios  can  be  addressed  by 
automatically  generating  new  variants  of  the  software  that  reflect  changing  issues  in  the  problem  domain. 
Our  approach  should  reduce  the  explicit  quality  assurance  efforts  needed  each  time  the  software  is 
changed.  By  amortizing  the  quality  assurance  effort  applied  to  the  template  over  many  applications  of  the 
same  templates,  we  can  reduce  quality  assurance  costs.  The  benefits  increase  with  the  number  of  systems 
generated  from  the  same  templates. 


This  paper  focuses  on  static  checks  that  can  be  completely  automated.  Our  research  is  also  addressing 
testing  and  debugging  of  program  synthesis  rules  and  proofs  of  rule  properties  that  require  human 
assistance  with  deeper  reasoning.  These  efforts  are  outside  the  scope  of  the  current  paper,  which  is 
organized  as  follows: 

•  Section  2  formalizes  software  generation  patterns  and  defines  a  uniform  construction  to 
obtain  a  template  language  for  any  target  programming  language. 

•  Section  3  describes  methods  for  statically  certifying  syntactic  correctness  generated  code, 
and  gives  an  example. 

•  Section  4  does  the  same  for  analysis  of  exceptions. 

•  Section  5  contains  comparisons  to  previous  work 

•  Section  6  presents  conclusions. 

2.  Template  Languages 

The  purpose  of  a  template  language  is  to  define  software  synthesis  patterns  for  a  given  target 
language.  We  create  such  languages  based  on  a  functional  object  model  of  code  generation  templates.  We 
take  a  functional  (i.e.  side-effect-free)  approach  because  this  simplifies  the  algebraic  basis  of  the  approach 
and  supports  effective  static  analysis  methods  such  as  those  presented  in  Section  3  and  4. 


R  VieW  te,mPlate  lan8ua«es  as  ^tensions  of  the  corresponding  target  programming  languages, 
use  many  different  programming  languages  are  created,  we  will  need  many  different  template 

languages.  However,  all  of  these  can  be  defined  at  once  by  providing  uniform  construction  such  asChat 
snown  m  rigure  2. 


^hl1jhlS  ,1S  3  V,eiy  Slmple  construction,  but  it  is  very  expressive.  In  addition  to  providing  substitution  of 
actual  values  for  generic  parameters,  as  in  the  generic  units  of  Ada  and  the  templates  of  C++  our 
construction  includes  conditionals  that  are  evaluated  at  code  generation  time,  and  the  ability  to  invoke 
other  templates.  Recursion  is  included.  y 


Templatejanguage  {template,  formal_def,  template_expression } 

DEF_TEMPLATE(id[template] ,  type,  seqtformal_def],  template.expressionV 
template  -  where  type 0  e  target_language 

DEF_FORMAL(template_parameter,  type):  formal_def 
—  declares  the  type  of  a  formal  parameter 

template_parameter  <  {id [any],  template_expression} 

IF(template_expression,  template_expression,  template__expression): 
template_expression 

APPL Y(id[template],  seq[template_expression]) :  template_expression 
template_expression  <  target_language 

Figure  2.  Template  Abstract  Syntax 


The  construction  depends  heavily  on  the  use  of  inheritance 
programming  languages.  The  situation  is  illustrated  in  Figure  3. 


in  object-oriented  modeling  of 


Figure  3.  Generic  Template  Language 


In  object-oriented  modeling,  class-wide  types2 
time  we  add  a  subclass  with  a  new  constructor, 
extending  its  value  set. 


are  viewed  as  open  and  extensible.  Specifically,  each 
we  add  more  instances  to  the  class-wide  type,  thus 


We  model  the  abstract  syntax  of  a  language  using  a  type  for  each  kind  of  semantic  entity  In  a 
properly  constructed  abstract  syntax,  there  should  be  one  such  type  for  each  non-terminal  symbol  Each 
constructor  of  these  types  corresponds  to  a  production  of  the  grammar.  Subclass  relationships,  denoted  by 
-  ,  specify  that  every  instance  of  the  subclass  is  also  an  instance  of  the  parent  class.  Multiple  inheritance 

ind  a°iTn  leX!mr’ m  !me  6  0fFigUre  2  says  that  every  temPlate  parameter  is  a  kind  of  identifier, 
a  kind  of  template  expression.  This  kind  of  subclass  relationship  is  used  to  incorporate 
reusab  e  types  m  a  library  of  programming  language  building  blocks,  such  as  identifiers,  and  to  specialize 
reusable  concepts  to  the  application,  such  as  template  expression.  If  T  is  a  type  and  S  is  a  set  of  types, 
means  T  is  a  subclass  of  each  element  of  S.  This  represents  multiple  inheritance. 


JS-  1S,A?a  95  terminoI°gy-  The  instances  of  a  class  wide  type  include  its  direct  instances  and  those  of 
all  its  subclasses,  transitively. 
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Subclassing  is  also  used  to  interface  between  a  target  programming  language  and  its  extensions.  In 
Figure  2,  "target-language"  denotes  the  set  of  types  comprising  the  abstract  syntax  of  the  target  language. 
Figure  4  shows  a  very  simple  example  of  a  target  language  that  illustrates  how  this  works. 

targetjanguage  =  (stmt,  exp) 

assign(var,  exp):  stmt 
if(exp,  stmt,  stmt):  stmt 

integer  <  exp  —  integer  literals 

var  <  {id[any],  exp}  —  program  variables 

apply(id[function],  seqjexp]):  exp  —  operations 

subtype  rule:  x  <  y  =>  id[x]  <  id[y]  where  x,  y  e  type 

Figure  4.  Example:  Micro  Target  Language 


The  example  in  Figure  5  defines  a  code  generation  pattern  that  embodies  Newton’s  method  for 
polynomial  evaluation,  which  is  optimal  in  terms  of  number  of  evaluation  steps  needed.  This  is  a  very 
simple  example  of  a  code  generation  pattern  that  is  nevertheless  realistic,  because  it  embodies  a  solution 
method.  The  example  also  illustrates  the  use  of  all  the  constructs  in  the  template  language.  We  use  infix 
syntax  for  the  exp  constructors  *  and  +  to  improve  legibility  (e.g.  x*y  is  short  for  the  term  apply(*,  x,  y)). 

An  additional  benefit  of  considering  the  abstract  syntax  to  be  an  algebra  rather  than  a  tree  is  that  we 
can  used  well-studied  transformation  rules.  In  particular  we  can  associate  equational  axioms  with  the 
programming  language  types  that  define  normal  forms.  Figure  5  illustrates  the  use  of  such  axioms  as 
rewrite  rules  that  simplify  the  code  produced  by  the  generator  in  a  follow-on  normalization  process.  This 
is  one  way  to  incorporate  optimizations  into  the  program  generation  process,  which  is  useful  for 
unconditional  transformations. 

TEMPLATE  evaluate_polynomial  (v:  var,  c:  seq[integer]):  exp 
-  c  contains  coefficients  of  a  polynomial,  lowest  degree  first 
IF  not  (is_empty  (c) )  -  use  operations  of  boolean  and  seq 
THEN  v  *  (evaluate_polynomial  (v,  rest(c)))  +  first  (c) 

ELSE  0 

END  TEMPLATE 

Template  application  evaluate-polynomial(x,  [1, 2, 3J)  generates 
X  *  (x  *  (x  *  o  +  3)  +  2)  +  1 

Normalization  with  integer  rales  i  *  0  =  0,  i  +  0  =  i  reduces  to 
x  *  (x  *  3  +  2)  +  1 

Figure  5.  Example:  Generation  Pattern 


Code  generation  using  the  template  language  is  a  very  much  like  evaluation  in  a  functional 
programming  language  with  call-by-value  semantics.  Analysis  of  templates  can  take  advantage  of 
equational  reasoning,  substitution,  and  structural  induction.  The  limitation  to  primitive  recursion 
facilitates  the  latter.  The  recursion  in  the  example  is  structural  because  rest  is  a  partial  inverse  for  the 
sequence  constructor  add  (i.e.  rest(add(x,  s))  =  s). 

3.  Syntactic  Correctness  of  Generated  Code 

We  treat  the  abstract  syntax  structures  of  the  target  language  as  the  values  of  the  abstract  data  types 
representing  the  programming  language.  We  require  these  types  to  provide  a  pretty  printing  operation  that 
outputs  such  objects  as  text  strings  according  to  the  concrete  syntax  of  the  target  language,  with  a 
readable  format.  Establishing  correctness  of  these  pretty  printing  operations  is  straightforward,  and  in  fact 
their  implementations  can  be  generated  from  an  appropriately  annotated  grammar  for  the  concrete  syntax. 

Given  trusted  pretty  printing  operations  for  the  object  model  of  the  target  language,  syntactic 
correctness  of  the  output  reduces  to  the  type-correctness  of  the  ground  terms  generated  by  the  evaluation 
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of  the  templates.  This  can  be  checked  using  a  simple  type  system  for  the  template  language  and 
conventional  type  checking  methods.  Note  that  we  are  referring  to  the  types  associated  with  the  signatures 
of  the  constructors  in  the  object  model  of  the  target  programming  language,  rather  than  the  types  within 
the  target  programming  language,  which  may  not  even  be  a  typed  language.  The  process  is  illustrated 
figure  6.  The  computed  type  annotations  are  shown  in  italics.  The  type  annotations  associated  with  the 
imphdt  induction  step,  where  the  type  signature  of  the  template  itself  is  used,  is  highlighted  in  bold 
italics.  The  indentations  of  the  type  annotations  reflect  the  structure  of  the  derivation. 

TEMPLATE  evaluate.polynomial  (v:  var,  c:  seq[integer]):  exp 
IF  not  (is_empty  (c ;  seq [integer]  ) ;  boolean ) :  boolean 
THEN  +  (  *  (  v  :vart 

evaluate_polynomial 

(v  :  var, 

rest(c:  seq[integer] ) :  seq[integer])  :  exp 

)  :  exp 

first  (c:  seqpntegerj )  •  integer 

)  •'  exp 

—term  form  of  v*  evaluatejpolynomial  (v,  rest(c))  +  first  (c) 

ELSE  0  c  ■intor.ar 

END  TEMPLATE  '  f  9 

Types  conform  because  integer  <  euqkvar  <  exp 

Relevant  signatures:  +(exp,  exp)  :exp,  *(exp,  exp)  :exp, 

first(seq[T]):  T,  rest(seq[T]):  seq[T], 
is_empty(seq[T]):  boolean,  not(boolean):  boolean 

Figure  6.  Example:  Syntactic  Correctness  of  Generated  Code 

Note  that  induction  has  been  carried  out  implicitly,  as  a  routine  step  of  the  type  checking  calculation 
This  !s  sufficient  to  establish  partial  type  correctness  of  the  templates,  which  implies  syntactic  correctness 
of  all  code  that  could  be  generated  by  the  template,  it  does  not  automatically  guarantee  total  correctness 
because  we  still  have  the  possibility  that  evaluation  of  the  template  might  fail  to  terminate. 

Total  correctness  is  established  by  the  type  check  if  we  check  that  all  recursions  are  primitive  The 
example  satisfies  this  condition  because  rest  is  a  partial  inverse  of  the  compound  sequence  constructor 
rest(add(x,s))  -  s.  This  means  that  the  induction  is  in  fact  structural,  and  hence  that  evaluate.polynomial 
is  total.  Thus  the  template  will  produce  syntactically  correct  code  for  all  input  values  that  conform  to  the 
type  signature  of  evaluatejpolynomial. 

We  note  that  given  declarations  of  the  target  language  constructors  that  define  the  abstract  syntax  and 
the  corresponding  partial  inverse  operations,  it  is  straightforward  to  automatically  check  that  all  recursive 
calls  are  primitive  with  respect  to  any  given  parameter  position.  This  implies  that  structural  induction  can 
!!?  fpPhef  ™lformly  completely  automatically  in  this  context.  Furthermore,  our  experience  suggests 
that  stmchiral  recursions  are  sufficient  to  define  the  code  generation  templates  needed  in  practice,  and  that 
emplate  designers  can  live  within  the  restriction  to  structural  recursions  without  undue  hardships. 


4.  Exception  Closures  for  Generated  GnHp 

One  common  source  of  software  failure  is  unhandled  exceptions.  This  section  explains  a  method  for 
certifying  that  all  programs  generated  from  a  given  template  cannot  generate  any  unhandled  exceptions 
when  placed  in  a  context  that  handles  a  specified  set  of  exceptions.  P 

Our  approach  is  to  refine  the  type  system  to  record  the  set  of  exceptions  that  might  be  raised  by  the 
eva  nation  of  any  expression  of  the  target  language.  A  similar  structure  can  be  used  to  analyze  the  set  of 
exceptions  that  might  be  raised  by  execution  of  a  statement  of  the  target  language. 


The  refinement  replaces  the  single  target  language  type  exp  with  a  parameterized  family  of  types 
exp[set[exception]].  The  intended  inteipretation  of  this  type  structure  is  that  evaluation  of  an  expression 
of  type  exp[S]  might  raise  an  exception  e  only  if  es  S.  Since  we  do  not  require  all  exceptions  in  S  to  be 
producible,  this  family  of  types  has  a  rich  subclass  structure  defined  by  the  following  relation: 

SleS2  =>  exp[Sl]  <  exp[S2] 

The  type  signatures  of  an  operation  are  specified  explicitly  for  argument  expression  type  that  cannot 
raise  any  exceptions,  and  are  extended  to  all  other  types  by  the  following  rule,  which  describes  the 
essential  pattern  for  propagating  exceptions: 

F(exp[0]) :  exp[Sl]  =>  f(exp[S2]):  exp[Sl  u  S2] 

The  rule  for  operations  with  multiple  arguments  is  similar.  Similar  rules  apply  to  language  constructs 
representing  exception  handlers.  Exception  handlers  follow  rules  of  the  form 

(TRY  exp[Sl]  CATCH  e  USE  exp[S2]):  exp[(Sl-{e})  u  S2], 

Figure  7  shows  the  exception  analysis  for  our  running  example.  The  parts  added  to  the  version  in 
Figure  6  are  underlined. 


TEMPLATE  evaluate_polynomiaI  (v:  var,  c:  seqfinteger]):  exp  [{ovfl}] 

IF  not  (is_empty  (c :  seq  [integ)er]boole^n  boolean 
THEN  +(*(v :  v£|r 

evaluate_polynomial(v:  var, 

rest(c :  seq  [integer ]se)?//nfegej- >  exp  [{ovfl}] 
first  (c :  seq [integ)er]integ^ar  expf  {ovfl)] 

-  term  form  of  v  *  evaluate_polynomial  (v,  rest(c))  +  first  (c) 

ELSE  0 :  integer 
END  TEMPLATE 

Types  conform  because  integer < exp  [0]  <  exp  [(ovfl)]  and 
var  <  exp  [0]  <  exp  [  ( ovfl )  ] 

Relevant  signatures:  +(exp,  exp):  exp  [{ovfl)]  ,  *(exp,  exp):  exp  [(ovfl)]  , 
first(seq[T]):  T,  rest(seq[T]):  seq[T],  is_empty(seq[T]):  boolean, ,  not(boolean):  boolean 


Figure  7.  Exception  Closure  of  Generated  Code 


Note  that  we  require  the  author  of  the  template  to  specify  in  the  type  declaration  of  a  template  the  set 
of  exceptions  the  generated  expression  is  allowed  to  raise.  This  acts  as  an  induction  hypothesis  in  our 
exception  analysis,  which  is  used  when  analyzing  the  recursive  call  of  evaluate-polynomial.  It  also 
provides  useful  information  for  the  user  of  the  generated  code. 

The  analysis  shown  in  the  figure  establishes  a  partial  exception  closure:  it  guarantees  that  all 
expressions  generated  by  the  template  can  at  most  raise  only  the  exception  ovfl  representing  integer 
overflow. 

To  establish  a  total  exception  closure,  we  have  to  address  clean  termination  of  the  template  expansion 
at  program  generation  time.  The  primitive  recursion  check  explained  in  the  previous  section  guarantees 
there  will  be  no  infinite  recursions,  so  that  termination  is  guaranteed.  However,  for  clean  termination,  we 
must  also  check  that  evaluation  of  the  template  will  not  raise  any  exceptions  at  program  generation  time. 

Note  that  the  analysis  in  Figure  7  addresses  run-time  exceptions.  When  viewed  as  constructors  of  the 
abstract  syntax,  +  and  *  are  total  operations.  Overflow  exceptions  can  occur  only  when  those  expressions 
are  evaluated,  not  when  they  are  constructed. 


The  sequence  operators  first  and  rest  are  different:  they  are  partial  query  methods  of  the  abstract 
syntax,  not  total  constructors.  If  applied  to  an  empty  sequence,  they  raise  a  sequence  underflow  exception. 
However,  this  can  occur  only  at  program  generation  time,  not  at  run  time. 


To  certify  clean  termination  of  template  at  program  generation  time  requires  a  type  refinement  to 
record  sets  of  possible  exceptions  and  an  additional  kind  of  type  refinement  to  record  domains  of  partial 
methods  such  as  first  and  rest.  We  can  introduce  a  subtype  nseq[T,  S]  <  seq[T,  S]  consisting  of  the 
nonempty  sequences,  and  refine  the  signatures  of  the  partial  sequence  operations  first  and  rest  as  follows. 

first(nseq[T,  0]):  T[0] ,  rest(nseq[T,  0]):  seq[T,  0] 

first(seq[T,  0]).  T[seq_underflow],  rest(seq[T,  0]):  seq[T,  {seq_underflow}] 

T^’Pe  analysis  requires  a  bit  of  inference  in  this  case,  because  we  have  to  use  the  guard  of  the 
template  language  conditional  IF  together  with  the  rule 

s  :  seqfT,  S]  and  not  is-empty  (s)  =>  s:  nseq[T,  S] 

This  inference  is  easy  because  the  guard  matches  the  subtype  restriction  predicate  for  nseq[T], 

This  match  did  not  occur  by  accident  -  the  purpose  of  the  guard  is  precisely  to  ensure  that  the 
operations  first  and  rest  are  used  only  within  their  domain  of  definition.  In  the  interests  of  being  able  to 
produce  certifiably  robust  code,  we  claim  that  it  would  not  be  unduly  burdensome  to  require  that  template 
designers  associate  domain  predicates  with  all  partial  operations,  and  use  those  domain  predicates 
explicitly  in  guards  whenever  they  are  needed  to  ensure  the  partial  operators  are  used  within  their  proper 
domains  of  definition.  For  example,  first  could  be  associated  with  a  domain  predicate 

first-ok  (seq[T]) :  boolean  where 
first-ok  (s)  =  not  (is-empty  (s)). 


This  would  enable  a  fast  and  shallow  analysis  of  guard  conditions  to  certify  absence  of  exceptions  in 
cases  like  this  Some  such  restriction  is  necessary  for  practical  engineering  support  because  the  problem 
of  checking  whether  an  unconstrained  guard  condition  implies  the  domain  predicates  of  arbitrary  euarded 
partial  operations  is  undecidable.  6 

An  alternative  is  an  exception  analysis  that  includes  exceptions  in  the  closure  even  in  cases  where  the 
guard  condition  ensures  they  will  never  arise.  We  suggest  that  it  is  more  practical  to  handle  a  common 
subset  of  efficiently  recognizable  forms,  and  to  ask  designers  to  work  within  the  constraints  of  those 
recognizable  forms.  We  believe  this  would  be  less  burdensome  than  the  alternative  of  manually  analyzing 
the  cases  where  a  type  check  insensitive  to  guard  conditions  would  nominate  exceptions  that  cannot  in 
fact  occur,  and  that  it  would  lead  to  a  more  robust  software  by  making  it  practical  to  do  complete  analysis 
ot  exception  closures.  For  example,  we  could  require  the  example  of  Figure  7  to  be  written  in  a  stylized 
form  that  looks  like  the  following: 


IF  first-ok  (c)  and  rest-ok  (c) 

THEN  ...  first  (c) ...  rest  (c) ... 

A  similar  type  check  would  have  to  be  applied  to  the  implementations  of  first  and  rest  to  ensure  that  they 
would  m  fact  terminate  cleanly  whenever  the  domain  predicates  are  true. 

5.  Comparisons  to  Previous  Work 

One  of  our  contributions  has  been  to  formalize  and  abstract  the  idea  of  a  program  generation  pattern 
to  make  it  independent  of  the  details  of  the  target  programming  language  and  the  process  of  instantiating 
t  e  patterns.  The  purpose  of  this  was  to  create  context  in  which  systematic  analysis  of  program 
generation  patterns  becomes  possible  and  in  some  cases  becomes  decidable. 

Program  generation  patterns  have  been  evolving  for  a  long  time.  Macros  are  an  early  form  of  the 
idea.  However,  macros  are  notoriously  difficult  to  analyze,  partially  because  they  traditionally  operate  on 
unmterpreted  text.  This  makes  the  connection  between  macro  definitions  and  the  behavior  they 
ultimately  denote  complicated  and  potentially  veiy  indirect.  The  macros  in  LISP  are  an  improvement 
because  they  are  based  on  abstract  syntax  trees  rather  than  characters.  However,  in  this  context  a  second 
source  of  complexity  becomes  apparent:  a  macro  can  expand  to  produce  another  macro,  and  the  number 
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The  sequence  operators  first  and  rest  are  different:  they  are  partial  query  methods  of  the  abstract 
syntax,  not  total  constructors.  If  applied  to  an  empty  sequence,  they  raise  a  sequence  underflow  exception. 
However,  this  can  occur  only  at  program  generation  time,  not  at  run  time. 

To  certify  clean  termination  of  template  at  program  generation  time  requires  a  type  refinement  to 
record  sets  of  possible  exceptions  and  an  additional  kind  of  type  refinement  to  record  domains  of  partial 
methods  such  as  first  and  rest.  We  can  introduce  a  subtype  nseq[T,  S]  <  seq[T,  S]  consisting  of  the 
nonempty  sequences,  and  refine  the  signatures  of  the  partial  sequence  operations  first  and  rest  as  follows. 

first(nseq[T,  0]):  T[0] ,  rest(nseq[T,  0]):  seq[T,  0] 

first(seq[T,  0]):  T[seq_underflow],  rest(seq[T,  0]):  seq[T,  { seq_underflow}] 

Type  analysis  requires  a  bit  of  inference  in  this  case,  because  we  have  to  use  the  guard  of  the 
template  language  conditional  IF  together  with  the  rale 

s  :  seq[T,  S]  and  not  is-empty  (s)  =J>  s:  nseq[T,  S] 

This  inference  is  easy  because  the  guard  matches  the  subtype  restriction  predicate  for  nseq[T]. 

This  match  did  not  occur  by  accident  -  the  purpose  of  the  guard  is  precisely  to  ensure  that  the 
operations  first  and  rest  are  used  only  within  their  domain  of  definition.  In  the  interests  of  being  able  to 
produce  certifiably  robust  code,  we  claim  that  it  would  not  be  unduly  burdensome  to  require  that  template 
designers  associate  domain  predicates  with  all  partial  operations,  and  use  those  domain  predicates 
explicitly  in  guards  whenever  they  are  needed  to  ensure  the  partial  operators  are  used  within  their  proper 
domains  of  definition.  For  example,  first  could  be  associated  with  a  domain  predicate 

first-ok  (seq[T]) :  boolean  where 
first-ok  (s)  =  not  (is-empty  (s)). 

This  would  enable  a  fast  and  shallow  analysis  of  guard  conditions  to  certify  absence  of  exceptions  in 
cases  like  this.  Some  such  restriction  is  necessary  for  practical  engineering  support  because  the  problem 
of  checking  whether  an  unconstrained  guard  condition  implies  the  domain  predicates  of  arbitrary  guarded 
partial  operations  is  undecidable. 

An  alternative  is  an  exception  analysis  that  includes  exceptions  in  the  closure  even  in  cases  where  the 
guard  condition  ensures  they  will  never  arise.  We  suggest  that  it  is  more  practical  to  handle  a  common 
subset  of  efficiently  recognizable  forms,  and  to  ask  designers  to  work  within  the  constraints  of  those 
recognizable  forms.  We  believe  this  would  be  less  burdensome  than  the  alternative  of  manually  analyzing 
the  cases  where  a  type  check  insensitive  to  guard  conditions  would  nominate  exceptions  that  cannot  in 
fact  occur,  and  that  it  would  lead  to  a  more  robust  software  by  making  it  practical  to  do  complete  analysis 
of  exception  closures.  For  example,  we  could  require  the  example  of  Figure  7  to  be  written  in  a  stylized 
form  that  looks  like  the  following: 


IF  first-ok  (c)  and  rest-ok  (c) 

THEN  ...  first  (c) ...  rest  (c)  ... 

A  similar  type  check  would  have  to  be  applied  to  the  implementations  of  first  and  rest  to  ensure  that  they 
would  in  fact  terminate  cleanly  whenever  the  domain  predicates  are  true. 

5.  Comparisons  to  Previous  Work 

One  of  our  contributions  has  been  to  formalize  and  abstract  the  idea  of  a  program  generation  pattern, 
to  make  it  independent  of  the  details  of  the  target  programming  language  and  the  process  of  instantiating 
the  patterns.  The  purpose  of  this  was  to  create  context  in  which  systematic  analysis  of  program 
generation  patterns  becomes  possible  and  in  some  cases  becomes  decidable. 

Program  generation  patterns  have  been  evolving  for  a  long  time.  Macros  are  an  early  form  of  the 
idea.  However,  macros  are  notoriously  difficult  to  analyze,  partially  because  they  traditionally  operate  on 
uninterpreted  text.  This  makes  the  connection  between  macro  definitions  and  the  behavior  they 
ultimately  denote  complicated  and  potentially  very  indirect.  The  macros  in  LISP  are  an  improvement 
because  they  are  based  on  abstract  syntax  trees  rather  than  characters.  However,  in  this  context  a  second 
source  of  complexity  becomes  apparent:  a  macro  can  expand  to  produce  another  macro,  and  the  number 
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o  expansion  steps  before  the  generated  source  code  actually  appears  is  potentially  unbounded.  This 
makes  the  system  very  difficult  to  analyze.  At  the  other  extreme  are  the  generic  units  of  Ada.  These  are 
strongly  typed,  clearly  connected  to  the  abstract  syntax  of  the  language,  and  the  results  of  instantiating 
them  are  easy  to  analyze.  However,  they  do  not  allow  conditional  decisions  at  instantiation  time,  and  are 
res  icted  m  the  sense  that  the  abstract  syntax  trees  of  all  possible  instantiations  have  exactly  the  same 
shape,  up  to  substitution  for  the  formal  parameters  of  the  pattern.  A  language-independent  version  of  the 
idea  can  be  found  m  [5],  although  this  appears  to  be  largely  text-based. 


a  aTCt  °f  °Ur  aPProach  1S  to  model  languages  as  algebras  rather  than  as  abstract  syntax  trees. 

hint  of  this  idea  appears  m  [4],  although  it  is  not  exploited  there  for  enabling  analysis  to  any  significant 
egree.  The  work  of  the  CIP  group  [1]  develops  this  idea  further  and  takes  advantage  of  the  reasoning 
structures  that  come  with  the  algebraic  modeling  approach,  such  as  term  rewriting  and  generation 
mductmn  principles.  This  suggests  extension  to  a  full  object-oriented  view,  which  includes  inheritance. 

e  Refine  system  is  the  earliest  context  we  know  of  where  grammars  are  treated  as  object  models  with 
potenhal  inheritance  structures,  although  the  documentation  does  not  give  any  hint  about  the  significance 
of  this  capability.  In  this  paper  we  demonstrate  the  usefulness  of  algebraic  models  of  syntax  with 
languages ^  defming  language  extension  transformations  that  can  be  applied  to  all  possible  target 

Another  theme  is  lightweight  inference  [2],  We  have  demonstrated  that  some  useful  types  of  static 
anaiysis  for  program  generation  patterns  can  be  performed  via  computable  and  indeed  reasonably 
efficient  methods.  The  processes  described  here  can  be  implemented  using  technologies  typically  used  in 
compilers,  such  as  object  attribution  rules,  they  terminate  for  all  possible  inputs,  and  do  so  in  polynomial 
time  We  believe  this  approach  will  scale  up  to  large  applications,  and  are  currently  working  out  the 
details  to  support  a  tight  analysis  of  the  efficiency  of  the  process. 

Tins  paper  has  explored  static  analysis  of  meta-programs  to  check  syntactic  correctness  and 
exception  closure  of  the  generated  code.  Another  kind  of  static  analysis  in  this  family,  type  checking  of 

proceedings  pS] t0  CnSUre  ^  ^  correctness  of  the  generated  code,  is  considered  by  another  paper  in  this 

6.  Conclusions 

We  believe  that  formal  models  of  program  generation  templates  can  support  a  variety  of  quality 
improvement  processes  that  can  help  achieve  cost-effective  software  reliability.  This  paper  has  presented 
a  simple  example  of  such  a  formal  model  and  two  such  quality  improvement  processes,  certification  of 
syntactic  correctness  and  freedom  from  unexpected  exceptions  for  all  programs  that  can  be  generated 
from  a  given  program  generation  pattern.  We  expect  the  greatest  advantages  of  this  approach  to  be 
realized  when  it  is  applied  to  realize  flexible  and  reliable  systems  in  a  product  line  approach  This 
approach  should  be  augmented  with  systematic  methods  for  domain  analysis  that  culminates  in  the 
development  of  a  domain-specific  library  of  solutions  embodied  in  a  domain-specific  software 
architecture  that  is  populated  with  components  produced  by  model-based  software  generators.  When  the 
echnology  matures,  it  should  become  possible  for  problem  domain  experts  to  specify  their  problem 
instances  in  terms  of  familiar  problem  domain  models,  and  to  have  reliable  software  solutions  to  their 
problems  automatically  generated,  without  direct  involvement  of  computer  experts. 

The  economic  advantage  of  this  approach  comes  from  the  ability  to  automatically  reap  the  benefits  of 
each  quality  improvement  for  all  past  and  future  instantiations  of  the  template  (if  past  applications  are 
regenerated)^  We  believe  that  it  will  be  profitable  to  explore  methods  for  lifting  many  known  program 
anatysis  techniques  from  the  level  of  individual  programs  to  the  level  of  program  generation  patterns. 

his  should  be  explored  for  a  variety  of  issues  that  range  from  certifying  absence  of  references  to 
uninitialized  variables,  absence  of  deadlock,  and  many  others,  perhaps  ultimately  to  template-based  proof 
oi  post  conditions  and  program  termination  for  generated  programs. 

To  make  this  vision  practical,  many  engineering  issues  must  be  addressed,  including  presentation 
issues,  methods  for  lightweight  inference  [2]  and  support  for  transforming  and  enhancing  complex  sets  of 
analysis  rules.  Other  issues  include  systematic  methods  for  dynamic  analysis,  testing,  and  debugging  of 
program  generation  rules.  It  is  not  reasonable  to  expect  progress  to  occur  in  an  instantaneous  quantum 
leap  to  perfection.  A  realistic  process  is  a  gradual  one,  where  simple  sets  of  program  generation  rules  are 
deployed,  and  gradually  tuned,  improved,  certified,  and  extended.  A  key  issue  is  enabling  rule 
enhancement  and  exception  closure  extension  without  invalidating  all  previous  effort  on  analysis  and 
certification  of  the  previous  versions. 
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The  difference  between  the  program  generation  approach  proposed  here  and  current  compiler 
generation  tools  is  the  associated  static  analysis  capabilities  for  the  program  generation  rules.  It  is 
possible  that  in  the  future,  ultra-reliable  compilers  will  be  built  using  techniques  derived  from  those 
introduced  in  this  paper. 
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Abstract:  This  paper  aims  at  structuring  detection  of 
different  types  of  Stuck-at  faults  for  a  wide  range  of 
Multistage  Interconnection  Networks  (MINs).  The  results 
reported  so  far  in  this  respect  are  mainly  based  on  direct 
combinatorial  analysis  of  the  concerned  networks  with 
very  little  consideration  towards  the  modelling  aspects. 
The  graphical  representation  coupled  with  well-defined 
semantics  allowing  formal  analysis  has  already  established 
Petri  Net  as  an  effective  tool  for  modelling  dynamic 
systems.  However,  the  existing  variants  of  high  level  nets 
had  certain  limitations  in  modelling  the  dynamic  behaviour 
of  mapping  a  permutation  through  MIN  and  further 
analysis  of  the  same.  This  has  inspired  the  authors  to 
propose  a  couple  of  new  high  level  net  model,  called  MP- 
net  and  S-net  in  their  earlier  works.  The  S-net  model  uses 
tokens  to  hold  and  propagate  information  apart  from 
controlling  firing  of  events.  It  uses  two  different  types  of 
places  and  transitions  each  as  has  been  defined 
subsequently.  In  this  paper,  we  have  concentrated  in 
detection  of  fault  in  MINs  using  this  S-net  model. 

Keywords 

Petri  Net,  MIN,  Stuck-at-fault,  S-net,  Data  Place , 

Control  Place 

I.  Introduction 

Generalized  Stochastic  Petri  Net  (GSPN)  is  a  performance 
analysis  tool  [11]  based  on  the  graphical  system 
representation  typical  of  Petri  nets,  in  which  some 
transitions  are  timed,  while  others  are  immediate. 
Distributed,  parallel  and  real  time  systems  may  be 
modelled  using  this  GSPN.  However,  for  any  large  system 
comprising  of  large  number  of  components  the  time 
distributions  and  relations  between  components  are  often 
quite  complex  [07,  08].  This  largeness  and  complexity  is 
reflected  in  the  corresponding  GSPN  models. 

The  capability  of  incorporating  time  as  a  parameter  in  net 
based  models  have  been  taken  care  of  with  the 
introduction  of  Time  Petri  Nets  [13]  and  Timed  Petri  Nets 
[12,  14,  15].  The  Timed  Petri  nets  are  derived  from  Petri 
nets  by  associating  a  finite  firing  duration  with  each 
transition.  The  classical  firing  rule  of  Petri  nets  is  thus 
modified  to  account  for  the  time  taken  to  fire  a  transition 


Swapan  Bhattachaxya 
Department  of  Computer  Science  &  Engineering 
Naval  Post-Graduate  School 
Montery 
CA,  USA 

email :  swapan@cs.nps.navy.mil 

and  also  to  express  that  a  transition  must  fire  as  soon  as  it 
is  enabled.  Time  Petri  Nets  (TPN)  are  more  general  in  the 
sense  that  a  Timed  Petri  net  can  be  modelled  by  using 
Time  Petri  net,  but  the  reverse  is  not  true.  For  both  of  these 
models,  firing  of  a  transition  is  a  non-atomic  operation. 
The  firing  is  said  to  be  in  progress  in  between  a  start  firing 
event  and  an  end-firing  event. 

In  the  context  of  MINs,  binary  values  are  used  to  represent 
information  pertaining  to  data  as  well  as  control.  A  study 
of  different  variants  of  high  level  nets,  as  discussed  above, 
indicates  that  for  modelling  different  processing  elements 
of  distributed  computation,  some  additional  flexibility  is  to 
be  incorporated  in  the  basic  modelling  tool  to  take  care  of 
variations  in  structures  and  functionality  of  these  hardware 
elements. 

In  the  Modified  Petri  Net  (MP-net)  model  [03],  as  defined 
by  us  earlier,  two  different  types  of  Places  and  Transitions 
are  used  [03].  The  MP-net  model  for  a  NxN  network 
consisting  of  0(log2N)  stages  would  involve  0(Nlog2N) 
number  of  subnets,  one  each  for  every  2x2  cross-bar 
switch  that  constitute  the  MIN.  The  total  number  of  Data 
and  Control  places  as  well  as  the  number  of  Controlled 
transitions  will  therefore  be  0(Nlog2N).  This  would  lead  to 
an  unmanageable  and  complex  situation  for  the  description 
of  a  large  system.  Thus  it  has  been  felt  that  the  proposed 
MP-net  model  requires  further  compactness.  The 
Stochastic  behavior  of  MP-net  is  coupled  with  the 
properties  of  Colored  Petri  net  [10]  to  propose  a  new 
powerful  high  level  net  called  S-net.  It  has  been  achieved 
by  equipping  each  token  with  an  attached  data  value  called 
the  Token  color.  In  S-net,  there  has  been  a  significant 
improvement  in  total  number  of  places  as  well  transitions 
comparing  Mp-net.  Both  redundant  path  MINs  like  Benes 
and  non-redundant  path  MINs  like  Omega  or  Baseline 
have  been  modelled  using  S-net  [01] [03]. 

Essentially,  a  variant  of  Coloured  and  Stochastic  nets,  the 
S-net  has  been  established  as  an  ideal  tool  for  modelling 
any  element  that  has  to  handle  two  different  types  of 
signals  in  repetitive,  modular  units.  It  has  been  already 
used  to  model  different  types  of  MINs,  e.g.  Omega,  Benes, 
Baseline  networks,  etc  [03].  It  has  also  been  found  that 
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using  S-net,  a  MIN  of  NxN  size  can  be  modelled  with 
3N/2  number  of  places  only  as  against  0(Nlog2N)  number 
of  switching  elements  for  the  corresponding  MIN.  As  far 
as  studying  the  MINs,  the  results  reported  so  far  [02],  [04], 
[07]  are  mainly  based  on  direct  combinatorial  analysis  of 
the  concerned  networks  with  very  little  consideration 
towards  the  modelling  aspects.  In  the  present  paper,  we 
have  concentrated  in  detection  of  fault  in  MINs  using  this 
S-net  model.  The  definition  of  S-net  and  some  of  the 
relevant  terminologies  that  are  essential  to  understand  the 
actual  problem  of  permutation  mapping  and  fault  detection 
using  the  model  are  presented  in  the  following  section. 

II.  Definition  and  Terminology 

2.1  S-net  Model 

An  S-net  model  uses  two  different  types  of  places  and 
transitions  each  that  enables  it  to  handle  data  and  control 
signals  as  two  separate  entities.  S-net  is  represented  by  a 
seven-tuple  {D,  C,  P,  Tc,  Th  I,  O],  where, 

D  ={ dj :  dj  is  a  Data  place]  ; 

A  data  place  holds  exactly  one  token  in  it  at  some  instance. 
The  token  value  is  a  positive  integer  that  indicates  the 
information  held  by  the  element  being  modelled.  A  token 
stored  in  Data  place  is  not  used  to  decide  the  flow  of 
Control.  A  Data  place  is  always  safe. 

C  ={cj :  Cj  is  a  Control  place]  ; 

A  control  place  holds  token  to  enable  corresponding 
Controlled  transition  for  firing.  A  token  value  is 
represented  by  an  ordered  pair  <x,y>  where  x  represents 
token  color  and  y  is  the  control  value,  typically  0  and  1  for 
two  logical  states.  The  number  of  tokens  in  a  Control  place 
must  not  exceed  the  number  of  different  colors  used  in  the 
model  and  there  can  be  only  one  token  for  a  particular 
color  in  a  single  Control  place.  A  Control  place  is  k- 
bounded,  where  the  maximum  number  of  colors  in  the 
model  is  k. 

P=  [p  :p  is  a  color}; 

Tc  :={t :  t  is  a  Controlled  transition]  ; 

A  Controlled  transition  can  have  one  and  only  one  Control 
place  at  its  input  and  the  transition  is  enabled  and  fired  in 
presence  of  some  token  of  the  same  color  as  that  for  the 
current  stage,  having  some  pre-defined  control  value  in  the 
corresponding  input  Control  place. 

Tj  :=  { t :  t  is  an  Immediate  transition]  ; 

An  Immediate  transition  is  enabled  and  fired  irrespective 
of  the  presence  of  token  in  its  input  place.  In  fact,  none  of 
the  input  places  for  an  Immediate  transition  is  a  Control 
place.  An  Immediate  transition  is  fired  in  between  a  start 
time  and  an  end  time  in  a  stochastic  manner. 

I  =  {Tc,  Tj]  — >  D"  is  the  input  function,  a  mapping  from 
transitions  to  bags  of  Input  places. 

O  =  {TC)  T }  —>  D  is  the  Output  function,  a  mapping  from 
transitions  to  bags  of  Output  Data  places. 

Different  sets  of  places  and  transitions  as  specified  in  the 
definition  of  S-net  are  disjoint.  Unlike  MP-net,  in  the 
proposed  S-net  model,  the  same  Data  places  are  to  hold 


different  data  values  for  different  colors  as  indicated  by 
some  member  of  the  Color  set  P.  Similarly,  the  token 
value  to  be  stored  in  a  Control  place  depends  on  the  color. 
The  firing  rule  for  the  two  types  of  transitions  are  very 
similar  to  that  for  the  MP-net  except  that  incase  of  S-net, 
the  color  of  the  token  is  considered.  Whenever  a 
Controlled  transition  is  fired  in  color  p,  tokens  in  its  input 
Data  places  are  transferred  to  corresponding  output  Data 
places  following  the  directed  arcs.  The  token  in  the 
Control  place  of  color  value  p  is  removed  after  the 
Controlled  transitions  are  fired.  On  the  other  hand,  an 
Immediate  transition  connecting  an  input  Data  place  Dk  to 
an  output  Data  place  Dm  for  color  p  transfers  the  token  of 
Dk  to  Dm  on  its  firing. 


2.2  Properties  of  S-net 

The  S-net  (D,  C,  P,  Tc,  Tj,  I,  O)  has  been  defined  to 
structerise  the  performance  analysis  of  MPP  systems  and 
some  of  its  subsystems  with  the  help  of  modelling  through 
it.  Before  this,  in  the  present  section  some  of  the  basic 
properties  of  S-net  has  been  discussed. 

2.2.1  Marking  :  The  presence  of  token  values  in 
places,  at  an  instance,  is  called  marking  of  the  S-net  model. 
There  will  be  two  separate  sets  of  markings  •D(d{).  dj,  ..,dD) 
for  D  Data  places  and  vCco,  c,,  ...  cc)  for  C  number  of 
Control  places  such  that  dk  for  k  e[l..D]  is  some  positive 
integer  a  if  the  corresponding  Data  place  holds  a  token  of 
value  a.  On  the  other  hand  marking  of  a  Control  place  ck 
for  k€[l..C]  is  a  set  of  ordered  pairs  <a,b>,  where  a  is  the 
token  color  and  b  is  the  control  value. 

In  case  of  a  2x2  cross-bar  switch,  the  control  value  is 
either  0  or  1,  indicating  the  through  and  crossed  states  of 
the  switch  respectively.  The  number  of  elements  in  the  set 
of  ordered  pairs  ck  must  not  exceed  the  number  of  different 
colors  used  in  the  model  and  there  can  be  only  one  token 
of  a  particular  color  in  a  single  Control  place.  As  a 
Controlled  transition  is  fired  in  a  color  p,  the  token  of  the 
same  color  in  its  input  Control  place  is  perished.  Thus, 
after  the  last  set  of  Controlled  transitions  of  in  an  S-net 
model  is  fired,  all  the  Control  places  are  found  empty. 

2.2.2  Initial  State  Definition  :  The  initial  state  of  a  S- 
net  is  defined  as  a  marking  of  Data  and  Control  places. 
Initially  all  the  Data  places  being  used  in  a  model  holds 
one  token  each.  In  case  a  NxN  multistage  interconnection 
network  is  being  modelled,  the  input  permutation  is  stored 
as  the  initial  state  of  the  set  of  N  Data  places.  A  Control 
place,  on  the  other  hand,  is  initialised  with  k  different 
tokens  (k<m),  for  maximum  m  number  of  colors  being 
used  in  the  model.  A  controlled  transition  is  enabled  at 
color  p,  if  the  corresponding  input  Control  place  is 
initialized  with  a  token  of  color  p  with  appropriate  control 
value. 

2.2.3  Boundedness  :  A  place  in  a  net  is  Safe  if  the 
number  of  tokens  in  that  place  never  exceeds  one  The  Data 
places  in  S-net  are  therefore  Safe,  by  definition.  Actually, 
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Safeness  is  a  special  case  of  the  more  general  Boundness 
property.  A  place  is  said  to  be  k-safe  or  k-bounded  if  the 
number  of  tokens  in  that  place  never  exceed  an  integer 
value  of  k.  A  place  that  is  1-safe  is  simply  called  Safe 
place.  Therefore  the  Control  places  used  in  a  S-net,  are  p- 
bounded,  where  the  maximum  number  of  colors  in  the 
model  is  p.  As  the  all  places  in  a  S-net  are  bounded,  the  S- 
net  itself  is  bounded. 

2.2.4  Reachability  :  The  initial  marking  of  the  Data 
places  are  changed  as  different  Controlled  and  Immediate 
transitions  are  fired  in  different  colors  in  an  S-net  model.  A 
state  *D  is  said  to  be  reachable  if  a  particular  sequence  of 
firing  of  Controlled  and  Immediate  transitions  exists 
following  which  the  initial  marking  of  Data  places  is 
modified  to  *D-  A  reachability  set  may  be  defined  as  the  set 
of  all  markings  reachable  from  the  initial  marking.  The 
reachability  analysis  on  an  S-net  model  may  be  performed 
without  direct  consideration  of  the  changes  in  marking  of 
the  Control  places. 

m.  Permutation  mapping  and  Control  matrix 

A  permutation  P  consists  of  several  individual 
transmissions  in  between  two  extreme  input/output  lines  at 
the  opposite  ends  of  the  MIN.  A  Control  place  may  hold  a 
token  signifying  the  crossed  state  of  corresponding  2x2 
switch.  For  example,  let’s  consider  that  the  Control  place 
Cl  holds  a  token  in  pass  1.  This  would  result  in  shifting 
content  of  Data  place  D1  into  D2  at  the  end  of  pass  1. 
Before  the  second  pass  begins,  the  time  independent 
transitions  cascading  the  basic  blocks,  are  fired.  This 
would  take  the  original  content  of  D1  onto  D3.  Thus  after 
the  execution  of  second  pass,  the  content  of  first  input  line 
of  the  MIN  will  finally  be  mapped  onto  the  third  or  fourth 
output  line,  depending  upon  the  content  of  C2  at  pass  2. 


(Figure  1  :  S-net  model  for  a  4x4  Omega  Network) 


Thus,  it  may  be  inferred  that  presence  of  a  token  in  Cl  for 
pass  1  and  absence  of  a  token  in  C2  for  pass  2  enforces  a 
transmission  in  between  input  line  1  and  output  line  3  of 
the  Omega  network  under  consideration. 

The  effect  of  presence  or  absence  of  a  token  in  Cl  for  pass 
2  and  in  C2  for  pass  1  can  be  neglected.  A  matrix 
representation  ^ }  x>|  may  be  proposed  to  depict  the  state  of 

the  model  for  the  particular  link.  Here  1  represents 
presence  of  a  token  in  the  corresponding  Control  place,  0 


represents  absence  of  token  in  the  same  and  x  is  a  don’t 
care  symbol.  A  matrix  like  the  one  presented  above  may 
be  termed  as  Control  matrix.  For  any  NxN  MIN,  the 
dimension  of  the  Control  matrix  would  be  (kxm)  where  k 
represents  number  of  passes  and  m  is  the  number  of 
Control  places  in  the  model,  which  in  any  case  would  be 
n/2. 

An  entry  {Qj :  i  •  [l...k] ,  j  •  [l...m]}  in  the  Control  matrix 
reflects  the  presence  or  absence  of  a  token  in  Control  place 
Cj  for  pass  i.  Considering  a  few  links,  the  corresponding 
Control  matrices  for  those  are  presented  below  : 


rl  x\ 

fx  0\ 

link  1  3  : 

link  3— »  2  : 

fix' 

fx  r 

link  1  ->  4  : 

KX  b 

link  3  4  : 

l*  °> 

<1  X s 

fx  1) 

link  2  — >  1  : 

<°  x) 

link4-»  2  : 

<°  X; 

fix' 

link  2  — >  2  : 

[l  X; 

link4-»  4  : 

<x 

Thus,  mapping  of  every  individual  transmissions  can  be 
followed  using  the  proposed  model.  This  is  in  line  with 
the  fact  that  that  there  exists  a  path  in  between  every  pair 
of  input-output  lines  for  an  Omega  Interconnection 
network.  But  Omega,  being  a  blocking  MIN,  in  a  conflict 
free  situation,  the  Control  matrix  for  the  entire 
permutation,  which  essentially  is  a  group  of  n  individual 
links,  may  be  derived  with  the  help  of  the  Control  Matrixes 
for  individual  links. 


Let’s  consider  a  permutation  (2314)  to  be  mapped  using 
the  proposed  S-net  model  for  Omega  MIN.  The 
transmission  links  involved  are  thus  (1  3),  (2  ->  1), 

(3  ->  2)  and  (4  — »  4).  The  respective  Control  matrices 
for  the  links  are  to  be  considered  simultaneously  to 
identify  conflicts,  if  any.  A  conflict  here  may  be  of  the 
type  that  for  two  different  transmissions,  a  particular  Qj 
position  is  found  to  contain  1  in  the  Control  matrix  for 
some  link  and  0  in  the  Control  matrix  for  some  other  for 
any  two  values  of  i  and  j  where  i  •  [l...k]  and  j  •  [l...m]. 
The  Control  Matrix  for  the  permutation  (2314)  through 
Omega  Network  will  be  ^  ^ . 


IV.  Detection  of  Fault  using  S-net 

The  present  paper  aims  at  efficient  detection  of  different 
types  of  faults  using  the  proposed  S-net  or  its  variation. 
Corresponding  to  every  stage  of  a  MIN,  there  will  be  an 
expected  output  pattern  for  a  set  input  pattern.  Thus  an 
input  pattern  (pj  p2  P3  ...  pn)  gets  modified  at  different 
stages  of  a  MIN  as  it  is  mapped  through  the  same.  The 
different  stuck  at  faults  or  complete  failure  of  one  or  more 
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constituent  crossbar  switch(s)  can  be  detected  by 
observing  the  actual  output  patterns  off  the  stages  of  the 
MIN  and  comparing  it  to  the  corresponding  expected 
output  pattern.  In  the  event  that  these  two  patterns  are  not 
identical,  presence  of  fault(s),  its  type  and  position  may  be 
detected.  But  in  case  the  expected  and  actual  patterns  are 
the  same,  certain  types  of  stuck  at  fault(s)  may  still  be  there 
in  individual  switching  elements.  An  algorithm  has  been 
proposed  to  detect  faults  and  their  positions  in  such  cases. 

However,  without  using  a  High  level  net  like  the  S-net 
model,  all  the  input  and  output  links  of  each  and  every 
switch  is  to  be  checked  to  detect  fault(s).  Thus  0(Nlog2N) 
links  are  to  be  checked  all  together  to  detect  all  possible 
faults  in  a  NxN  MIN.  Further,  to  detect  fault  in  redundant 
and  non-redundant  type  of  MINs  different  approaches  are 
to  be  adopted.  The  following  algorithm  extracts  the  main 
advantage  of  having  only  0(N)  representative  data  places 
for  all  the  0(Nlog2N)  links  of  a  MIN  quite  efficiently. 
Thus  detection  of  various  faults  for  a  wide  range  of  MIN 
becomes  much  easier  by  using  S-net  model  of  the  MIN. 

At  a  particular  stage  k  of  the  MIN,  the  N/2  number  of 
switches  in  the  stage  hold  an  input  pattern  (ikl  iK  ..  i^)  at  N 
input  links.  This  is  represented  by  the  content  of  the  N 
Data  places  for  colour  k  before  the  Controlled  Transitions 
are  fired.  Depending  upon  the  content  of  the  Control 
Matrix  as  mentioned  above,  the  Controlled  Transitions  are 
fired  and  just  before  any  of  the  Immediate  Transitions  for 
the  pass  are  fired  content  of  the  Data  places  in  the  model 
reflect  the  expected  output  pattern  (o^  o^  ..  o^)  off  the 
stage  k.  One  may  find  whether  the  actual  output  pattern 
(oa^  oa^  ..  oa^)  is  same  as  (okl  o^  ..  o^).  A  bitwise 
operation  can  detect  some  position  q,  where  the  actual 
content  of  the  output  link  oa^  and  expected  value  o^  are 
not  the  same.  This  indicates  that  the  [q/2^  switch  at  stage 
k  of  the  MIN  is  faulty.  A  0(Nlog2N)  algorithm  has  been 
presented  below  to  describe  the  fault  detection  in  MINs 
using  the  proposed  S-net  model. 

Procedure  DetectFault 

Var 

boolean  flag 

bit  0[M][N],  OA[M][N] 

integer  k,q; 

Begin 
flag  =  .T.; 

For  k=l  to  M  /*  M  is  0(Iog2N)  represents  the  number 
of  stages  in  the  MIN  */ 

For  q=l  to  N 

Derive  0[k][q]  from  I[k][q]  and  Control  matrix 

entry  for  pass  k; 

If  (0[k][q]  ©  OA[k][q])=  1  then 
Indicate  fault  in  fq/2 1th  switch  of  stage  k; 
flag=.F.; 

Endif 

Fire  Immediate  transitions; 


Endfor 

Endfor 

If  flag  ==  .T.  then 

Permutation  may  be  mapped  successfully; 

Endif 

End 

The  algorithm  presented  above  checks  for  faults  that  might 
block  a  particular  permutation.  This,  however,  does  not 
ensure  that  the  whole  MIN  is  fault-free  even  if  the  variable 
flag  is  found  to  be  .T.  after  the  final  iteration  is  over.  For 
example,  if  a  switch  is  having  a  stuck-at-T  fault  and  at  the 
same  time  if  the  corresponding  Control  matrix  entry  for 
permutation  P  is  set  to  0,  the  permutation  can  be 
successfully  even  in  presence  of  the  fault. 

Thus  for  detection  of  multiple  stuck  at  faults,  in  a  non- 
redundant  path  network,  the  algorithm  DetectFault  is  to  be 
operated  in  two  passes.  In  pass  1,  all  the  control  matrix 
entries  are  set  to  0  whereby  the  Stuck-at-X,  Stuck-at-U 
and  Stuck-at-L  faults  are  detected.  In  pass  2,  all  the 
control  matrix  entries  are  set  to  1  and  the  Stuck-at-T, 
Stuck-at-U  and  Stuck-at-L  faults  are  detected.  Thus  under 
fault-free  condition,  in  pass  1,  the  Identity  permutation 
should  be  realized,  whereas  in  pass  2,  the  Complement 
permutation  should  be  realized. 

Similarly  for  Redundant  path  and  Partially  Redundant 
MINs  as  well,  all  types  of  faults  can  be  identified  with  the 
help  of  S-net  model.  There  are  two  basic  advantages  in 
using  the  S-net  model.  Firstly,  the  approach  provides  a 
Snapshot  in  the  sense  that  for  different  colours  the  same 
Data  places  are  representing  the  entire  network.  In  stead 
of  looking  into  the  four  input/output  links  of  each  of  the 
0(Nlog2N)  switches,  the  reliability  of  the  MIN  may  be 
decided  just  by  observing  content  of  a  fixed  number  of  N 
Data  places  for  any  NxN  network.  This  helps  in  designing 
a  simple  but  efficient  fault  detection  algorithm. 

Apart  from  this,  the  introduction  of  Control  matrix  makes 
it  more  convenient  to  understand  mapping  of  a 
permutation  through  different  stages  of  a  MIN.  This  is 
also  quite  helpful  for  any  performance  analysis  of  the 
network  as  instead  of  physically  setting  the  crossbar 
switches  with  some  control  signal,  the  impact  can  be 
studied  just  by  altering  the  corresponding  Control  matrix 
entry  and  then  looking  into  the  changes  in  the  S-net  model. 

V.  Conclusion 

The  methodology  for  modelling  MINs  using  the 
proposed  S-net,  has  a  wide  range  of  applicability.  These 
high  level  net  models  are  designed  to  achieve  optimum 
compactness  so  that  analysis  can  be  done  more  efficiently. 
Total  number  of  Data  and  Control  places  in  the  proposed 
S-net  models  for  any  MIN  is  much  less  than  that  of  2x2 
Cross-bar  switches  required  to  design  the  network  itself. 
The  number  of  switching  elements  required  for  a  NxN 
MIN  would  be  0(Nlog2N)  whereas  the  corresponding  S- 
net  model  would  consist  of  only  N  number  of  Data  places 
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and  N/2  number  of  Control  places.  Moreover,  for  any  NxN 
network,  number  of  places  is  a  constant  linear  function  of 
N  only.  Modelling  with  S-net  is  thus  quite  effective  for 
compact  representation  of  Interconnection  networks  as 
well  as  for  detecting  different  types  of  faults.  In  stead  of 
looking  into  the  input/output  links  of  each  of  the 
0(Nlog2N)  switches,  the  performance  of  a  MIN  may  be 
studied  and  faults  may  as  well  be  detected  just  by 
observing  content  of  a  fixed  number  of  N  Data  places  for 
any  NxN  network.  Further,  the  introduction  of  Control 
matrix  and  the  algorithm  as  discussed  in  section  IV  suggest 
that  the  present  work  may  be  extended  to  study  and  detect 
Stuck  at  faults  in  a  wide  range  of  MINs.  It  is,  therefore, 
being  proposed  to  consolidate  this  research  work  by  taking 
care  of  the  analysis  of  different  Interconnection  networks 
based  on  this  model. 
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.he  wrapper  from  a  specifa,ion  "W 


1.  Introduction 
1.1  Background 


^es’  sizes  and  byte  ordering,  in  order  to  make  them  suitable  for  interoperation 
hese  problems  make  interoperable  applications  difficult  to  construct  and  manage. 


L2  Current  State-of-the-art  solutions 

low-level'  interoperability  problems  range  from 

like  object  resource  brokers  (CORBA^  DCOM)  middleware  technology 

abstraction  than  messaging,  and  can  simplWythfconsmfr1^  6^6  technolo§y  uses  higher 

provides  a  bndge  between  the  service  provider  flnd  I0n  of  interoperable  applications.  It 
mechanisms  that  handle  communication  data  ?  Pr0Viding  standa^ed 

implementation  details  of  the  middleware  are  ^enerallv  nnr  ^  marshalIing-  The 
the  systems.  Instead,  developers  are  concerned imP°«™  to  developers  building 
in  ormation  hiding  enhances  system  maintainabilitv  h  6  mterface  details.  This  form  of 
mechanisms  from  the  developers  and  p“b"  a  sublet, the  communication 
However,  developers  still  need  to  perform  significant^ I? T  f°r  the  developers, 

services  into  their  systems.  Furthermore  they  must  hav  “  the  middleware’s 

the  middleware  services  to  fully  exploit  Ihe  features  provided  ^°Wledge  of  how  t0  deP^y 

services I^Ttty  c^dTo^  severs.  Any" aS?"  “  **  design  ‘  the  data  and 
tighuontrol  enCOUnte^  ^P^lT^ues  duftofeis 

1.3  Motivation 

““atp“sPrc^  z:,tZnfr‘  parsm-  **  - » » ^ 

uncoupled  from  any  particular  process  Prc  rr^  ”  WOrk  on  the  ^  ■«  a!so 

at  the  same  time.  So  far,  building  distributed  d  ♦  WOrk  on  dlfferent  pieces  of  data 

interface  has  proved  to  be  more  dauntino  than  ot£  co'™^  ,‘°gether  with  ,heir  requisite 
techniques.  The  arrival  of  JavaSpace  has  changed  ,  entIonal  interoperability  middleware 
creation  and  access  of  distributed  objects  Howe ve  I  T™  *  S°me  extent‘ *  aIIows  easy 
network,  duplicated  data  items,  out-dated  data  external  el  C°n,CenVng  data  8ettmg  lost  in  the 
of  communication  between  the  data  owner  and  data  m  dli"g  and  handshaking 

to  devise  ways  to  solve  those  problems  and 

1.4  Proposal 

SeetatTProVe  a  —  -king  on 
objects  as  local  objects  within  the  of  bating  distributed 

distributed  object  as  if  it  is  local  within  the  process  ^  d£Velopers  could  then  modify  the 
be  reflected  on  other  applications  using  that  distributed  ^  however’  stih  need  to 

related  to  inconsistency.  The  current  fesearch  aims  bjCCt  wlthout  creating  any  problems 
model  of  an  interface  wrapper  that  can Te  useTf  mng  tWS  °bjeCtive  ^  ^ing  a 
addition,  by  automating  the  process  of  generating  the  L'T*5'  °f  distributed  objects.  In 
interface  specification  of  the  requirem 
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JeneS ^orator  (A1CG):  has  been  developed  to 

the  Prototype  System  Description  Language  (PSD^  rLUOgTl  a^Clfl“tl0n  ^nguage  called 
distnbuted  data  structure  and  JavaSpace  TechnnW  ,  '  Th  US6S  the  PnnciPle  of 

synchronization,  and  notification  together  with^ifetime  control  ^  *anSaCtl0n  contro1’ 
that  treats  distributed  objects  as  if  there  were  local  within  the 

2.  Review  of  Existing  Works 
2.1  ORB  Approaches 

?  -s?  ?»,  ««*  «. *. 

interoperability  2)  Architectures  for  unT  ac  es  [  J  include  l)Building  blocks  .for 

for  encapsulating^ inKroperabiJ^ty  ““  3) 

Kivia,  graphs  by  Berzins  [1]  J,h  ^  wS,  toof  Ae  ^  ‘he 

summaiy  of  the  strong  and  weak  points  of  various  approaches  ORB  an^f  S  ^ 
the  more  promising  technologies  for  interoperability.  a re  curreil,1y 

There  are  however,  some  concerns  with  the  OPR  c  ir  ri~, 

depth  analysis  of  the  DCOM  model  hiohli^htino  th*  ;  SulIlvan  H3]  provides  a  more  in- 

Interface  Negotiation  (how  a ^ 

Aggregation  (component  composition  mechanism)  The^T'065  3nd  interface)  and 
flinction  properly  within  the  aoarpaatpw  hryt  a  1,.  6  interface  negotiation  does  not 

share  an  interface  An  interface  is  shared  if  th^  1S  pr0^ern  arises  because  components 
several  components  canretuma  L  "  c°nstructor  or  Querylnterface  functions  of 

shared  mterfa^h  u d  be  b™  fate^^nf06  rate  '1,at  a  h<>lte  °pa 

and  outer  components.  How  ^e  ”  2  a“  can  reform  0nb0,h  im“ 

types  appearing  on  an  inner  component  the  inS  i  prov,dL“Krfa«a  of  “me 

fads  to  work  properly  with  respect  to  delegation  a  the  inner  interface”''  “S‘ QueiyInterface 

d“;^  of  the  techniques  is  required  to 

standalone  programming  techniaues  Addino  grai’lmors  however,  are  train  mostly  on 
increases  the  ifamin  “a°s  well  a”  devote  8  ,speclallzed  "et»ork  programming  models 
deadlines.  FurtheTo“re  bi  m  T’  W"h  °CCasi°nal  slippa^  »f  target 

consequences  of  Mure  ^  more  J ”1  prPgrams  are  harder  “  d«a«  id 

programs  to  go  astray  in  a  connected  distributed'en^roturient'lq]  {Ef”  C““  °ta 


2.2  Prototyping 

,0  the  P<*»  »  quantum 

solutions  to  this  problem  Completely  automated1  prototypin°  1S  one  of  the  most  promising 
level  language  is  feasible  and  imfact  genemmn  ofsTeTer°n  °f  pr0t0type  from  a  veiy  ^gh- 
common  in  the  computer  world.  One  major  advantage  nf  th  °  programmm°  structures  is  very 
that  it  frees  the  develoners  from  the  J  i  S  ofthe  automatic  generation  of  codes  is 

reusable  components  details  by  executing  specification  via 

environment,  named  Computer 

rapid  prototyjdng  ofTard  “  ‘he  NaVal  School.  for 

systems,  space  shuttle  avionics  systems^ and* r' SySt™i  S“Ch  aS  missile  Smdance 
and  Intelligence  (C3I)  systems  Till  Rani'd  .  1  3ry  Command,  Control,  Communication 

help  both  the  deve  opers  and  their  cZlrT^  "T  ^  C°nStruCted  pr0t0types  *> 
properties  in  an  iterahvl  process  The hZZrTp^  f  DPr°P°Sed  SyStem  and  assess  its 
Language  (PSDL).  It  serves  as  an  executable  f  ^  ^  Pr°t0typmg  System  Description 
design  level  and  has  soecial  feature,  !  prototyping  language  at  a  specification  or 

computer  aided  rapid  prototyping  systemTcAPS^  rTlTth  ^tpr  Budding  on  the  success  of 
for  the  specification  and  «f f  T^'  al$°  USeS  the  PSDL 

making  the  network  transparent  from  the  Seveloper's  p  “nt'oMe”  W“h  "*  °bjeC‘iVe  °f 
23  Transaction  Handling 

o 

»  stand-alone  system  in 

networked  application  The  networked  ,v,te  k  ,  C3re  °f  f°r  smooth  functioning  of  a 
computation.^which  can  leave^h^systern  hTanTnconsi  ^tent  state^1*^6  "  ^ 

Sta  ,h“m  y^6,  Tatned  thT1  and  “  ““y  “d 

(CC)  into  either  ’TtT"8  C<>nCU’™‘:y  ^ 

configurations  is  greatly  limited.  Hence  J presented  flex'blllty  of  ,hese 

transaction  server,  which  carries  out  the  *  presented  a  middleware  approach:  an  external 
obtaining  the  data  Advantages  of  this  a  ncurrency  control  policies  in  the  process  of 

tailored  to  applythe  ^slred*  OC^ohc^s  T  °  tr3nS3Cti°n  server  can  be  easily 

not  require  any  changes  to  the  serve  speci  ic  client  applications.  2)  The  approach  does 

model  3)  (SStion ^  **  “  transaction 

possible  if  all  ofthe  clients  use°the  ^  CC  ^  " 

provided  bymsw  T^nSZSTlfi)  ^  df0ying  “  eX,en,al  —set 

created  and  overseen  by  the  mLTgt  M"SaCt,°nS  by  ,he  d“"G  “<*  “-ers  are 


3.  The  Basic  Model 


JavaSpace  and  Jini  to  provide  a  Vmphffed  w .encaps“lati"S  some  of  the  features  of  the 
Section  3,1  examines  the  principles  of  Java.W  °fHdevelopinS  disIributed  applications, 
features  of  AICG  model.  P  Sp  and  section  3.2  discusses  some  of  the 


Figure  1,  AICG  Model 

3.1  The  JavaSpace  Model 


distributed  environment.  It  departs'  fn^^^ventio^a/ •^U-*nS  processes  together  in  a 
passing  between  processes  or  invoking  methods  on  remnt tnb“tlon  techniques  using  message 
a  fundamentally  different  programming  mnrui  th  .  te  obJects-  The  technology  provides 
processes  cooperating  via  the°fIow  of 'freshly  3  T^re  30  aPPdcati°n  as  a  collection  of 
spaces.  This  space-based  model  of  dk^t T  °bjeC‘S  lnt°  and  out  °{  °”a  <*  more 
coordination  language  [3]  developed  by  Dr.  Da”d  oZS  £le  Univerehy”  Und* 


Figure  2,  JavaSpace  operations 


a  persistent  o^ect  7  **“  reP°Sit0iy  35 

perform  simple  operations  to  write  new  objects  imn  shown  in  figure  2,  processes 

(make  a  copy  of)  objects  in  a  space.  When  taking  or  mT’  u- ol?jects  from  sPace>  or  read 
value-matching  lookup  to  find  the  objects  that  °  «.  dln®  obJects’  processes  use  a  simple 
found  immediately,  then  a  plesV tn  ^  unHI  '°  'h™'  If  3  maKhi”«  <*>«*  «  ™ 
stores,  processes  do  not  modify  objects  in  the  m.  amvef  Unllke  conventional  object 
modify  an  object,  a  process  m«  **.  “■»*  ^c.ly.  To 

Dunng  the  period  of  updating  other  nrocess^  ’  7 t  '  and  reinsert  11  int0  the  space, 
process  write  the  object  back  to  the  space.  recluestmg  for  the  object  will  wait  until  the 

Key  Features  of  JavaSpace- 

:  £SS2sa» — 

that  ensures  that  an  operation  on  a  sDace  is  3^°  °?  provides  a  transaction  model 

single  operations  on  a  single  space  as  well  as  ,Transactl0ns  are  supported  for 
spaces.  0  P3Ce’  3S  Wel1  as  multlPle  operations  over  one  or  more 

passive  data,  howevfrtwhen f  ^  SP3Ce’  °bjects  are  Just 
the  object  is  created.  Like  any  other  local  v  °  °  ^eCt  Prom  3  sPace’  a  focal  copy  of 
well  as  invoke  its  methods  °bject’  We  Can  modlfV  its  Public  fields  as 

3,2  The  AI CG  Model 

The  tool  is  7  building  distributed  applications, 

be  shared,  and  are  particularly  useful  for  annbca/  7  Structures  or  objects  that  need  to 
through  one  or  more  servers.  Build  on  top  of  JavaWc  as  flows  of  objects 

its  implementation  details  entirely  from  the  P,,v he  A^G  model  hldes  the  sPace  and 
applications  to  treat  distributed  data  structures  of  obtcJ1'  f  T rfaCe  wrapper  apows 

enhanCed  interoperabi,ity  by  -king  the  network$ transparent  to  £ 

psoirr^ description  ianguage 

sstsi  “  -  «  was 

application  code  need's  n^^p^d^how  ttf  W.lthin  .the  aPPhcation  process.  The 

object  copy  is  always  synchronous  with  the  IS  dlstnbuted’  since  the  local 

•  Synchronization  with  S?  the  distributed  copy,  (see  section  5) 

model  is  based  on  the  space  transact^  ^  aUt0mat^aIly  handled.  Since  the  AICG 

Deadlock  is  prevented  automatically  withfn^6  m°d5  ’  a11  °Peratl°ns  are  atomic. 

distributed  copy,  and  Ihmugh^ans^ctioncontro^fse^sectfon'b,  g3V*n®  a 
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3sssr»E£IS“  “**'  ■' > “ 

the  AtolXToeXe  transf ctj0n  secure  *>y  def“H-  AICG  transactions  are  based  on 
§)_  y'  C  ncy’  lso!al,on'  and  Durability  (ACID)  properties  (see  section 

’  mode!%Ceae\ecttaTlerChrS'r'°  dis,ributed  <*>«  >taough  the  AICG  event 
when  the  distributed objects  modified ^  ^  Subsc"be  for  chan§e  notification,  and 
callback  method  defined  by  the  devefoper  SePame  SPaW”ed  execute  1116 

■  descnp,,ve  ** 

4.  Developing  Distributed  Application  with  the  AICG  Tool 

molr^mntofVS,^  ^  deVel0pin*  distrib“ttd  WMons  usina  the  AICG 
4.1  Development  Process 

°|f-  ustng  the 

generator  (PSDLtoSpace,  l  produce 


jPSDLtoSpacef- 


PSDL  definition  of  the  distributed 
objects 


Set  of  Interface  Wrapper  Files 
Figure  3,  PSDL  to  Space 
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Figure  4,  Generating  the  interface 
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4.2  Input  definition  to  the  Code  generator 


the  de,Vel0pment  of  one  of  the  many  distributed  objects 


Type  track 

— 

IMPLEMENTATION 

SPECIFICATION 

SPACE 

tracknumber:  integer 

END 

PROPERTY  SPA CEMODE=  READ 

OPERATOR  track 

END 

END 

SPECIFICATION 

INPUT x:integer  ' 

OPERA  TOR  setPosition 

END 

SPECIFICATION 

1MPLEMENTA  TION 

SPACE 

INPUT  post :  position  type 

END 

PROPERTY  SPA CEMODE= 

CONSTRUCTOR 

IMP  LEM  ENT  A  TION 

END 

SPACE 

END 

PROPERTY  SPACEMODE  =  WRITE 

OPERATOR  get  ID 

PROPERTY  TRANSACTIONTIME  =  2000 

SPECIFICATION 

END 

OUTPUT  x:  integer 

END 

END 

IMPLEMENT  A  TION 

OPERATOR  getPosition 

SPACE 

SPECIFICATION 

PROPERTY  SPA CEMODE  =  READ 

END 

OUTPUT  post :  position  type 

END 

END 

IMPLEMENTATION 

SPACE 

PROPERTY  SPACEMODE  =  READ 

OPERATOR  set  Callsign 

SPECIFICATION 

INPUT  sign :  string 

END 

END 

IMPLEMENT  A  TION 

IMPLEMENTATION 

SPACE 

SPACE 

PROPERTY  SPACEMODE-  WRITE 

PROPERTY  SPACENAME=  DODSpaces 
PROPERTY  OWNERSHIP  =  YES 

PROPERTY  TRANSACTIONTIME  =  300 

END 

PROPERTY  SECURITY  =  SERVER 

END 

PROPERTY  LEASE  =  12000 

OP ERA  TOR  getCallsign 

PROPERTY  CLONE  =  MANY 

PROPERTY  NOTIFY  =  NO 

SPECIFICATION 

PROPERTY  RETRY  —  10 

OUTPUT  sign:  string 

END 

END 

Figure  5,  Track  example  in  PSDL 
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TYPE  trackjist 
SPECIFIC  AT lO 
END 

OPERATOR  track_list 
SPECIFICATION 
END 

IMP  LEM  ENT  A  TION 
SPACE 

PROPERTY  SPA CEM ODE = 
CONSTRUCTOR 
END 
END 

OPERATOR  get  ID 
SPECIFICATION 
INPUT  index:  integer 
OUTPUT  x:  integer 
END 

IMPLEMENTA  TION 
SPACE 

PROPERTY  SPA CEMODE  =  READ 

END 

END 

OPERA  TOR  setNewID 
SPECIFICATION 
INPUT  id:  integer 
END 


Figure  6,  Track  list 


IMPLEMENTATION 

SPACE 

PROPERTY  SPA  CEMODE  =  WRITE 
END 
END 


OPERA  TOR  removelD 
SPECIFICATION 
INPUT  id:  integer 
END 

IMPLEMENTA  TION 
SPACE 

PROPERTY  SPACEMODE=  WRITE 
PROPERTY  TRANSACTIONTIME  =  2000 
END 
END 
END 

IMPLEMENTATION 

SPACE 

PROPERTY  SPACENAME=  DODSpaces 
PROPERTY  OWNERSHIP  =  YES 
PROPERTY  SECURITY  =  SERVER 
PROPERTY  LEASE  =  0 
PROPERTY  CLONE  =  ONE 
PROPERTY  NOTIFY  =  YES 
PROPERTY  RETRY  =  5 
END 


example  in  PSDL 


(AooenSx  fTps™ 1°I  '  A‘CG  “  “  ex""ded  vm'0”  of  original  PSDL  grammar 
UeS  Howevef fte  MCC  T  “T  ^  Can  **  used  to  model  an  emire  distributed 
betoem  svsTems M  L J  “S  ‘  p0r,,0n  0f  the  PSDL  ,0  Scribe  «■=  interface 

psdl 

ri~L!XdndIheACOmPlete  liS,ing  °f  ,he  ^  “  the  —  can 


Tlie  track  PSDL  starts  with  the  definition  of  a  type  called  track.  It  has  only  one  identification 

fidd  TsLn  thfrii  °f  COUr!;'  'he  'raCk  °bjMS  Ca"  haV'  mOT£  oneVeM  b“r 

held  is  in  this  case  is  used  to  uniquely  identify  any  particular  track  object  The  tvoe 
trackjist  shown  in  figure  5,  on  the  other  hand,  does  not  need  an  identification  field  skce 
here  is  only  one  trackjist  object  in  the  whole  system.  Track  list  is  used  to  keep  a  hst  tfall 
the  active  tracks  tracknumber  m  the  system  at  that  moment  in  time.  P 

mahodtasTlk/lIf  ' *°d?  °Hthe  3re  defl"Cd  immediate|y  the  specification.  Each 

od  has  a  list  of  mpm  and  output  parameters  that  define  the  arguments  of  the  method. 


The  most  important  portion  in  the  method  declaration  is  the  implementation.  The  developer 
must  be  able  to  define  the  type  of  operation  the  method  supposed  to  perform.  The  operations 
are  constructor  (used  to  initialize  the  class),  read  (no  modification  to  any  field  in  the  class) 
and  write  (modification  is  done  to  one  or  more  fields  in  the  class).  These  are  necessary,  as  the 
code  generated  will  encapsulate  the  synchronization  of  the  distributed  objects. 

The  other  field  in  the  implementation  portion  of  the  method,  is  transactiontime, 
transactiontime  defines  the  upper  limit  in  milliseconds  within  which  the  operation  must  be 
completed.  The  transaction  property  is  discussed  in  detail  in  Section  8. 


Upon  running  the  example  on  figure  5  through  the  generator  tool,  a  set  of  Java  interface 

wrapper  files  are  produced.  Developers  can  ignore  most  of  the  generated  files  except  the 
following: 


•  Track.java:  this  file  contains  the  skeleton  of  the  fields  and  the  methods  of  the  track 
class.  The  user  is  supposed  to  fill  the  body  of  the  methods. 

•  TrackExtChent.java:  this  is  the  wrapper  class  that  the  client  initialized  and  used 
instead  of  the  track  class. 

•  TrackExtServer.java:  this  is  the  wrapper  class  that  the  server  initialized  and  used  in 
replace  for  the  track  class. 

•  NotifyAICG.java  :  this  class  must  be  extended  or  implemented  by  the  application  if 
event-notification  and  call-back  are  needed. 


The  methods  found  in  the  trackExtClient  and  trackExtServer  have  the  same  method  names 
and  signatures  of  the  track  class.  In  fact,  the  track  class  methods  are  been  called  within 
trackExtClient  or  trackExtServer. 

5.  Distributed  Data  Structure  and  Loosely  Coupled  Programming 

Conceptually  a  distributed  data  structure  is  one  that  can  be  accessed  and  manipulated  by 
multiple  processes  at  the  same  time  without  regard  for  which  machine  is  executing  those 
processes.  In  most  distributed  computing  models,  distributed  data  structures  are  hard  to 
achieve.  Message  passing  and  remote  method  invocation  systems  provide  a  good  example  of 
the  difficulty.  Most  of  the  systems  tend  to  keep  data  structure  behind  one  central  manager 
process,  and  processes  that  want  to  perform  work  on  the  data  structure  must  “wait  in  line”°to 
ask  the  manager  process  to  access  or  alter  a  piece  of  data  on  their  behalf.  Attempts  to 
parallelize  or  distribute  a  computation  across  more  than  one  machine  face  bottlenecks  since 

data  are  tightly  coupled  by  the  one  manager  process.  True  concurrent  access  is  rarely 
achievable. 

Distributed  data  structures  provide  an  entirely  different  approach  where  we  uncouple  the  data 
from  any  particular  process.  Instead  of  hiding  data  structure  behind  a  manager  process,  we 
represent  data  structures  as  collections  of  objects  that  can  be  independently  and  concurrently 
accessed  and  altered  by  remote  processes.  Distributed  data  structures  allow  processes  to  work 
on  the  data  without  having  to  wait  in  line  if  there  are  no  serialization  issues. 


The  distributed  protocol  for  modification  ensures  synchronization  by  enforcing  that  a  orocess 

Tact  X  m0dlfyT^  °bjeCt  ^  t0  phySiC3lly  remove  *  *om  ^e  space  Ter  h  and  wrhe  it 
back  to  the  space.  There  can  be  no  way  for  more  than  one  process  to  modify  an  object  at  the 

,™pime'  However,  this  does  not  prevent  other  processes  from  overwriting  ^corrected 
by  a  *:Z' JaVaSPaCe’ Pr0CKS  A  mS,ead  0f 

y  a  wnfe  operation,  the  programmer  wrote  a  “read”  operation  followed  bv  a  “writP” 
operation.  This  results  in  2  copies  of  the  object  in  the  Space  The  AICG model  preven  ts 

St U  JEST*"  - ' — **  “  *■ »  *» 

Loosely-coupled  programming  has  it  pitfalls  also.  Distributed  objects  may  be  lost  if  a 
p  cess  removes  it  rom  the  space  and  subsequently  crashes  or  is  cut  off  from  the  network 

m  a  A°f°ck  s,a,e  if 

n S ~T,  r  “? lh^  same  ,,me'  hold'"S  <®  to  distributed  objects  required  bv  other 

e™ “tot either  an one  ’  f  m°de‘  ““‘"P1'  ‘  — 

the  C°mP  16  °r  n°ne  °CCUr’  thereby  maintaininS  the  integrity  of 

the  application.  With  transaction  control,  deadlock  is  prevented  if  the  process  did  not 

comp  ete  the  operation  within  a  certain  permitted  time.  The  application  can  re,^  T 
operation  immediately  or  wait  for  a  random  time  before  performing  the  operation  again" 

6.  Synchronization 

Synchronization  plays  a  crucial  role  in  any  design  of  distributed  application  Inevitably 

svsKminto  3  S  SySBm  "eed  10  c00,dmaE  with  one  another  and  avoid  brinmnv  the 
system  into  an  unstable  state  such  as  deadlock.  Creating  distributed  applications  with  AICG 

S,  synchronization  since  synXonizationT almady 

gain^xclusiveTcce^to5  ? SS"  ^  !.*“  ‘°  rem°Ve  “  ,h=  ^  “d  Shy 

AICGrnTtw  j  fi,  ”ence’  coordmated  access  to  objects  is  enforced  by  the 

AILCj  interface  doing  read,  take  and  write  operations.  y 

JS  ,alVanC  fd  fd  CTPueX  LVichromzation  schemes  can  be  easily  build  upon  from  the 
baste  atomic  features  of  the  A1GC  operations.  An  example  is  semaphores  Semaphoms  a 
ynchromzation  construct  that  was  firs,  used  to  solve  concurrency  problems  ta  opeSn" 
systems  are  commoniy  found  in  multithreaded  programming  languages,  but  are  more 
tfficult  to  achieve  m  distributed  systems.  Semaphores  are  typically  impremented  as  integer 

SiSSaS  H  S/S«rS“a8e  °r  hardware  SUpport  ,0  ensure  ««  atomic  properties°of 
(he  UP  (signal)  and  DOWN  (wait)  operations.  Using  AIGC  space  model  we  could  easily 

implement  a  semaphore  as  a  shared  variable  that  holds  an  integer  counter  By  assigning  a 

distributed  variable  or  object  as  a  semaphore,  groups  of  distributed  obfecTSn  bj 

SS'S  tdence- the  AIQC  model  permits  the  developers  to  develop  more  complicated 
d  tnbuted  app , canons  without  being  concerned  about  synchronization  and  Stock 
Furthermore,  all  operations  within  the  AICG  model  can  impose  transaction  control  S 
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SSKT*  After  ‘he  ,ime0Ut  Peri0d’ the  transacti0"  ^  ^‘Oack  the  application 

7.  Object  Life  Time  (Leases/Timeout) 

is? s^i^rws,tl;ffs,:  °f  • *  “d  «** -  «* 

in  the  distributed  environment  whe^  f  f  Penod'  ™s  model  is  bmeBM 

thereby  disconnecting  them  from fte resources  hT  “S  h°'ders  “flumes  to  fail 
absence  of  a  leasing  model,  resources  could  grow  without  bound  eXp'’CitIy  free  ,hem-  In 

*  -ode,  bestdes  using  i, 

regarding  some  distributed  objects  hero  ,me  Vstem,  Ibe  value  of  the  information 

obsolete^information^carTl^more5 damaging  “nthS  £ 

distributed  object  the  AICG  model  sntnrw  n  S  CaSe*  By  settmg  the  Iease  on  the 
or  die  deadline  is  reached  automata%  removes  the  object  once  the  lease  expires 

Java  Spaces  allocate  resources  that  are  tied  to  leases  When  a  di<trihm*H  ,  • 

f r  'Tr ,he  -C 

does  neither,  the  lease  simply  expires,  and  the  space  mmovesaeen,^  tote 

Th£IAi  G?„e^.  tr r  — «-,rat.0„s,  Thes.  are 

leaseholder  (the  process  that  creates  the  object,  has  died^ThkT'f  eVe”‘f 

2  IT  maf  time' e'mS  'he  ^tf  W  pr0p^  in  ,he  l-Plemen,a,io„  ,o  0  8Urat‘°n  ‘S 

2  fcd  d™“  of  x  ms 

performed  on  the  object  before  the  lease  expires  TMsVn'r*  °perat'on  n,ust  be 
SPACE, ease  property  in  the  *0 

fPXfcmir^erh”e«idmu:  r^rf”  ,owards  ieasins  whue  «* 

model,  renewal  USE by caCg  any  m eSar  mS'r^^  “  ***  AICG 

required,  the  developer  can  consider  dpfi  •  j  1^ies  °^Ject-  If  no  modification  is 
“write".  Invoking  tha".  me,hod“™S,y  ^  SP“  -  “ 

8,  Transactions 
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8*1  Jini  Transaction  model: 


All  transactions  are  overseen  by  a  transaction  manager.  When  a  distributed  application  needs 

“  »  °CCUr  !n  3  -cure  manner,  the  process  asks  the  transaction  manager 

to  create  a  transaction.  Once  a  transaction  has  been  created,  one  or  more  processes  can 
°Peratlons  under  **  transaction.  A  transaction  can  complete  in  tvvo  ways  If  a 
transaction  commits  successfully,  then  all  operations  performed  under  it  are  complete, 
wever,  1  pro  ems  arise,  then  the  transaction  is  aborted  and  none  of  the  operations  occurs 

Sn  LS6o  S  ^  Pr°Vlded  by  3  ^'P11356  COmmit  Protoco1  **  is  performed  by  the 
transaction  manager  as  it  interacts  with  the  transaction  participants.  Y 


8.2  AICG  Transaction  model 


AICG  model  encapsulates  and  manages  the  transaction  procedures.  All  operations  on  the 
distributed  object  can  be  either  with  transaction  control  or  without.  Transaction  control 
operations  are  controlled  with  a  default  lease  of  2  sec.  This  default  value  of  leasing  time  may 
however,  be  overriden  by  the  user.  This  is  kept  by  the  transaction  manager  as  a  IS 

se  expires  before  the  °peration  committed’ the  ma-^ 

WhenSv  S  ,the, foliowing  desirable  effect  on  the  semantics  of  the  AICG  operations. 
When  a  distributed  object  is  created,  the  object  is  not  seen  or  accessible  outside  of  the 
transaction  until  the  transaction  commits.  However,  when  a  distributed  object  is  updated  or 

in  theUspace  anSaCtl°n’  *  030  C°me  fr°m  nCW  object  Created  within  the  transaction  or  objects 


P®  AjCG  m0del  by  default- enable  all  transaction  for  write  operations  and  the  transaction 

„p  1S  0  seconds'  The  developer  can  modify  the  lease  time  through  the  PSDL 

or  AC  h  transactiontime  property. 


PROPERTY 

transactiontime=  0:  Disable  transaction  for  that  method 
/n:  Set  the  lease  time  to  n  ms. 


he  read  operations  in  the  AICG  model  do  not  have  transactions  enabled.  However  the 
f3?  rbI7  by  USmg  lie  pr0perty  transactiontime  with  the  upper  limit  in  transaction 

fnllni  ^  re3^  °Peratl°n'  T°  USed  the  same  transaction  for  more  than  one  operation,  the 
following  property  must  be  set.  F 


PROPERTY 

transaction^  =  99  :  An  ID  number  that  are  the  same  for  more  than  one  method. 
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9.  AICG  Event  Notification 


In  the  distributed  and  loosely-coupled  programming  environment,  it  is  desirable  for  an 
application  to  react  to  changes  or  arrival  of  newly  distributed  objects  instead  of  “busy 
waiting”  for  it  through  polling.  AICG  provides  this  feature  by  introducing  a  callback 
mechanism  that  invokes  user-defined  methods  when  certain  conditions  are  met. 

Java  provides  a  simple  but  powerful  event  model  based  on  event  sources,  event  listeners  and 
event  objects.  An  event  source  is  any  object  that  “fires”  an  event,  usually  based  on  some 
internal  state  change  in  the  object.  In  this  case,  writing  an  object  into  space  would  generate 
an  event.  An  event  listener  is  an  object  that  listens  for  events  fired  by  an  event  source. 
Typically,  an  event  source  provides  a  method  whereby  listeners  can  request  to  be  added  to  a 
list  of  listeners.  Whenever  an  event  source  fires  an  event,  it  notifies  each  of  its  registered 
listeners  by  calling  a  method  on  the  listener  object  and  passing  it  an  event  object. 

Within  a  Java  Virtual  machine  (JVM),  an  application  is  guaranteed  that  it  will  not  miss  an 
event  fired  from  within.  Distributed  events  on  the  other  hand,  had  to  travel  either,  from  one 
JVM  to  another  JVM  within  a  machine  or  between  machines  networked  together.  Events 
traveling  from  one  JVM  to  another  may  be  lost  in  transit,  or  may  never  reach  their  event 
listener.  Likewise,  an  event  may  reach  its  listener  more  than  once. 

Space-based  distributed  events  are  built  on  top  of  the  Jini  Distributed  Event  model,  and  the 
AICG  event  model  further  extends  it.  When  using  the  AICG  event  model,  the  space  is  an 
event  source  that  fires  events  when  entries  are  written  into  the  space  matching  a  certain 
template  an  application  is  interested  in.  When  the  event  fires,  the  space  sends  a  remote  event 
object  to  the  listener.  The  event  listener  codes  are  found  in  one  of  the  generated  AICG 
interface  wrapper  files.  Upon  receiving  an  event,  the  listener  would  spawn  a  new  thread  to 
process  the  event  and  invoke  the  application  callback  method.  This  allows  the  application 
codes  to  be  executed  without  involving  the  developer  in  the  process  of  event-management. 

There  are  a  few  steps  for  setting  up  AICG  event  for  a  particular  application.  Firstly,  the 
distributed  objects  must  have  the  SPACE  properties  for  Notification  set  to  yes.  One  of  the 
application  classes  must  implement  (java  term  for  inherit)  the  notifyAICG  abstract  class.  The 
notifyAICG  class  has  only  one  method,  which  is  the  callback  method.  The  user  class  must 
override  this  method  with  the  codes  that  need  to  be  executed  when  an  event  fires. 

10.  AICG  Design 

This  section  explains  the  design  of  the  AICG  and  the  codes  that  are  generated  from  psdl2java 
program.  The  codes  used  in  this  section  to  explain  the  AICG  and  the  development  processes 
are  generated  from  the  track  PSDL  of  section  4.2. 

10.1  AICG  Architecture 


The  AICG  architecture  consists  of  four  main  modules.  They  are  the  Interface  modules,  the 
Event  modules,  Transaction  modules  and  the  Exception  module.  The  interface  modules 


referent  and  communicate  directly  with  the  application.  In 

trackExtClient,  trackExtServer.  Instead  of  creatS  the  a  ?  areentiyA1CG’  track’  trackExt, 
should  instantiate  the  interface  oh lit  1  the,actual  obJect  (track),  the  application 

modules  (eventAICG  D  evfnAl?rS  T*  thVTackExtCli^  or  trackExtServer.  Event 
from  the  ^  ^  ^  «d 

(transactionAICG.  transactionManaaerAirr'i  the  application.  Transaction  modules 
services.  Lastly,  the  m°ndU'e  With 

exceptions  that  can  be  raised  and  need  n  r  ?  ?  defines  the  Possible  types  of 

the  architecture  of  the  generated  interface  wran  l  V  u  appllcatl0IL  Fl§ure  ?  below  shows 
and  application.  6  Wrapper  and  the  interaction  with  the  other  modules 

daSS  *  «“**  *  new  tracicEx, Server,  the 

lack  object  is  placed  into  theEnt^object  Ind  sTored^nThe^pac^  traCkExtServen  The 

2.  Transaction  Manager  is  enabled.  space. 

3.  The  reference  pointer  to  trackExtServer  is  returned  to  the  application. 


Notification 

module 


JavaSpace 


trackSen  erExt 


getCallsign 


setCallsign 


getPosition 


trackExtServer 


'O.  TrTTT' 


Cosine  o - HZb"^ 

■  -  ■.■'c.v'.'.’.t.1-;.  ^77  7  7'-777'''h''  ',:7’7'777^  ~  ■  ■ 

P.  .  Transaction  Exception  Handling  Up 

■gore  7,  Archttecture  of  the  generated  interface  wrapper  and  the  interaction  with  the  other 

modules  and  application 

sssasirri--**--'- 

(hacL.L'rvertockETciientr0keS  'he  me,h0d  ,hr°"gh  lMerface 
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2.  The  Interface  performs  a  Space  “get”  operation  to  update  the  local  copy 
"  bade  to^he  app,icationXeCUte^  °°  ^  ^  °f  °^Ct  »  —  value 

EaCh  t!m®  \me*°d  (setCa'isign,  setPosition),  which  does  modify  the  contents  of  the  object 
is  invoked,  the  following  events  take  place  in  the  Interface-  J 

1 .  When  the  application  invokes  the  method  through  the  Interface 

-  Jhe  interface  performs  a  Space  “take”  operation,  which  retrieves  the  object  from  the 

o.  The  actual  object  method  is  then  invoked  to  perform  the  modification 

4.  Upon  completion  of  the  modification,  the  object  is  returned  to  the  space  by  the 
interface  using  a  “write”  operation.  p  Dy  me 

10.2  Interface  Modules 

The  interface  modules  consist  of  the  following  modules;  an  entry  (entryAICG)  that  are  stored 

,that  are  shared  and  tZ^a 

10.2.1  Entry 

^“S,“es  An  ' *  »  collection  of  typed  objects  that  implements  the  Entry 

interface.  The  base  class  of  the  AICG  distributed  object:  ^ 

public  abstract  class  entryAICG  implements  Entry 

•  !  main  iaentif cation  number 
public  Integer  entrylD; 

f/  required  by  JavaSpace  //default  constructor 
public  entryAICG {  J 

public  entryAICG {int  id) ( 

^  entrylD  =  new  Integer (id); 

//  return  the  object  stored  in  //the  entry 
public  abstract  Object  7 

getObjectf  ); 


The  Entry  interface  is  empty;  it  has  no  methods  that  have  to  be  implemented  Emntv 
interfaces  are  often  referred  to  as  “marker”  interfaces  because  they  are  usedTo  mark  fS 

as  suitable  for  some  role.  That  is  exactly  what  the  Entry  interface  is  used  for  to  mark  a  class 
appropriate  for  use  within  a  space.  ass 

All  entries  in  the  AICG  extend  from  this  base  class.  It  has  one  main  public  attribute  an 
he  enSvr  The^lvT^I  the  °bject  Any  tyPe  of  object  can  be  stored  in 

Zm TZZFZ?*? ,he  eMirb  f jec' by  vate  h“ 

example  track  entry  codes  generated  by  the  AICG  from  the  PSDL  file  in  fioure  4  The 
interface  contains  the  object  track  in  one  of  the  field  and  an  ID.  ° 


publac  abstract  class  trackEntry 
^  extends  entryAICG 

//  id  is  required  if  there  are  more 
//  than  one  similar  object  in 
/ /  the  space 

public  Integer  id; 

//  track  object 
public  track  data; 

/ /  default  Constructor 
public  trackEntryO  {  } 

//  Constructor  with  information 
//extracted  from  the  track  PSDL 
//  file. 

public  trackEntry ( int  aid.  Integer 
miD,  track  inData)  { 
super (aid) ; 
data  =  inData; 
id  =s  inlD; 

} 

public  Object  getObject(){ 
return  data; 

} 


deSta ?h0l,gh  “  n°'  of  fi'Ws  ® 

space-based  programs  locate  entries').!  the  space'  To  kt™!"1" l00lCUP  ‘S  the  way  Ihe 
specified  that  matches  the  contents  of  the  fields  Bv  HpH  object  *n  sPace>  a  template  is 
the  space  to  compare  and  locate  the  nhiert  a  mr*  y  Canng  entTy  ^ie^s  public,  it  allows 

style  by  encapsulating  the  actual  data  object  into  the^nt^The^b60^0^11^  pr0grammin§ 
detdared  as  private  and  made 


10.2.2  Serialization 


obj^^ft^noTa  reference°to  Tremote^b'  that  acts  as  a  Proxy  t0  the  remote  space 
and  value  through  the  proxy  to  the  remote  1^°  "  conitectIon  passes  all  operations 

to  meet  this  objective  The  Serializable  paC,  the  objects  must  be  serializable  in  order 

methods  and  se£2  ™iy to  markT 1“ Z  *  “f”  iMerfaCe  that 
Serializable  interface:  laSS  “  aPPr°P™*  ^  serialization.  Here  is  the 


public  abstract  interface  Serializable 
//  this  interface  is  empty 


In  that  case,  the  track  class  of  the  example  needs  to  implement  the  interface  Serializable. 

public  class  track  implements 
Serializable  { 

//  since  Serializable  is  a  marker 
//  interface  no  methods  need  to  be 
//override. 
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10.2.3  The  Actual  Object 


We  now  look  at  the  actual  objects  that  are  shared  between  the  servers  and  dim*  Th* 
psdl2java  generates  a  skeleton  version  of  the  actual  class  with  the  ,Th 

arguments.  The  body  of  the  methods  and  its  fields  need  to  be  filled  by  the  devdoperT  The 
track  class  generated  is  shown  below:  y  aeveiopers.  the 


public  class  track  implements 
java.io. Serializable 
{ 

private  Integer  trackNumber; 

public  track  (int  inID) { 

//  insert  the  body  here 
} 

public  int  getID(i{ 

//  insert  the  body  here 

public  void  setPcsition 
(position_type  post}{ 

//  insert  the  body  here 


public  posit ion_type  getPosition ( ) { 
//  insert  the  body  here 


public  String  getCallsign ( ) { 

/ /  insert  the  body  here 

} 

public  void  setCallsign (String 
sign) { 

/ /  insert  the  body  here 
} 

/ /  automatically  generated  do 
//  not  delete! ! 
public  Integer  autoGetIDl C) { 
return  trackNumber 


} 


} 


10.2.4  Object  Wrapper 


(^OTS?T!nftan  appr0f h  t0  Protecting  Ie§acy  software  systems  and  commercial  off-the-shelf 
(COTS)  software  products  that  require  no  modification  of  those  products  [1]  It  consists  of 
two  parts  an  adapter  that  provides  some  additional  functionality  for  an  applitaLn  p™ 

fnnhrJ 5  h  mterfaCe.S’  and  an  encaPsulation  mechanism  that  binds  the  adapted  to  the 

application  and  protects  the  combined  components  [1], 

In  this  context  the  software  being  protected  contains  the  actual  distributed  objects,  and  the 
CG  model  has  no  way  of  knowing  the  behaviors  of  the  distributed  object  other  than  the 

“  The  adapter  intercePts  a11  Evocations  to  provide  additional 
control  t  $UCh  35  synchr?nizatl0n  between  the  local  and  distributed  object,  transaction 
con  rol,  events  monitoring  and  exceptions  handling.  The  encapsulation  mechanism  has  been 
xp  amed  in  the  earlier  section  (AICG  Architecture).  Instead  of  instantiation  of  the  actual 


would  indirect?^ 

appended  are  naraed  witl>  *e  object  name 

1 0.3  Event  Modules 


«srss:s? ,he hmd,CT 

10.3.1  Event  Identification  object 

interests  SS'TS  &Z lica,' fr°m  0therS'  Wh“  a"  <* 
even,  source.  Together  these  two  propel  tol  *£?££*£  “S” 

S=c°sJ  Z ?at  ChKrk  if  tW0  event  *dcntiftcation 

searching  the  right  event  objects  from  the  has^tabir^  ‘S  by  ^  e''ent  handler  for 

10.3.2  Event  Handler 

reglstration^of^new'events^deledon^^oki  ev^t  °?.e™d?n  “  «“  AICG  model,  i,  handles 
key  to  the  hash  table.  This  allows  fas,  retdeUf'Jevem  o^c,' 
wrhten^the'space'an  event  “ fretted bvT^  SP“?  “  °ther  S0Urces'  When  an  obJec‘ » 

asxr* 


//  caj.1  when  an  external  event  is 

//  "fired". 

public  void  run()  { 

Object  source  =  event . cetSource () ; 
long  id  =  event . getXD () ; 
long  seqN  = 

event . getSequenceNumber { ) ; 

//  create  a  new  event  identif cation 
/ /ooject 

eventAICGID  keyID=  new 
eventAICGID (id, source) ; 
registerAICG  teir.pReg; 

String  key  =  new 

String (keylD . toString ( )  )  ; 

//  check  if  the  key  exist  in  the 
/ /  hash  table  (storage) 
if  { (tempReg  =  (registerAICG) 
storage .get (key) )  ! =null) 
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{ 

//.  =hec1^  if  the  event  is  an  old  or 
/ /  duplicate  event 

if  (seqN  >  tetnpReg .  seqNum)  { 
tempReg . seqNum  =  seqN; 
src . listenerAICGEvents 
( tempReg . anyOb j } ; 

}  else  { 

//  old  events  ignored 
return; 

} 

} 

} 

}//  end  of  notifyHandler 


10.3.3  The  Callback  Template 


program  when  certain  events  of  interest  is  “fired”  a  «  .  ,  •  d,  1  ,  oke  the  application 

implemented  by  the  apphcation  ,'h“s  SlavetSn^"’ th'  “mP,a*e  *°  b= 

public  interface  notifyAICG 

public  abstract  void 
listenerAICGEvents (Object  obj ) ; 


10.4  The  Transaction  Modules 

— in,erface  (transaclionAICG)  and  the 

2  transaction  interface  to  obtain  a  transaction  manager. 

2.  Create  a  default  transaction  with  lease  time  of  5  seconds 

3.  Create  a  transaction  with  a  user  define  lease  time. 

10.5  The  Exception  Module 

,  NotDefinedExceptionCode";  unknown  error  occur. 

"SystemEjcc^fionCode";  system  level  exceptions,  such  disk  failure  network  failure 
•  ObjectNotFoundException";  the  space  does  no.  contain  the  object' 
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:^riO”EXCeP'i0n":  lra"SaCtion  servw  not  fcmd,  transaction  expire  before 

•  "LeaseExpireException";  object  lease  has  expired 

•  ''CommunicationException";  space  communication  errors. 

UnusableObj ectException";  object  corrupted 

.*  WistExceptiotf,  there  another  object  with  the  same  key  in  the  space 

•  NotificationException";  events  notification  errors.  ’  P 

11.  Conclusion 

lies  within  the  Java  Virtual  Machine 7 JVM1  tZ  °f™11^conds'  The  high  overhead 

option  in  developing  interface  wrapper  for  disXted  system  ‘  "  S,'U  ‘  V,aWe 
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Computer  Aided  Prototyping  in  a  Distributed  Environment 
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Abstract+ 

Previous  work  on  computer-aided  prototyping  system 
(CAPS)  is  stepping  into  a  distributed  environment  to  meet 
the  requirement  of  integrating  legacy  systems  in 
heterogeneous  network.  A  three-module  architecture 
design,  including  Supporting  Database,  System  Tools  and 
Execution  Manager,  is  proposed  in  this  paper  for  the 
distributed  CAPS  system  (DCAPS).  By  using 
wrapper/glue  technique,  different  prototyping  tools  in  a 
heterogeneous  environment  share  the  input/output  data 
files  for  prototypes.  The  architecture  is  generalized  for  the 
communication  among  legacy  systems  for  data 
interchange.  DCAPS  not  only  provides  a  useful  tool  for 
distributed  real-time  system  prototyping,  but  also  is  a 
demonstration  of  distributed  system  in  heterogeneous 
environment. 

Key  words:  software  interoperability,  fast  prototyping, 
distributed  system,  multi-agent  system 

1.  Introduction 

Computer  aided  prototyping  has  been  found  useful  in 
software  development,  especially  for  large  real-time 
systems.  Prototyping  provides  the  capability  to  accurately 
simulate  requirements  in  new  application  areas.  Previous 
work  such  as  the  Computer  Aided  Prototyping  System 
(CAPS)  has  demonstrated  real-time  issues,  software  reuse 
and  process  scheduling  in  fast  prototyping  for  a  single 
processor  computing  environment  However,  it  is  still 
hard  to  make  use  of  existing  systems  in  a  distributed 
environment,  especially  for  real-time  systems  under  a 
heterogeneous  environment.  With  the  fast  development  of 
networks  and  the  Internet,  interoperability  has  become  the 
focus  of  current  research.  This  paper  extends  research  on 
CAPS  to  distributed  and  network  computing. 

Distributed  real-time  software  system  prototyping  and 
interoperability  in  a  heterogeneous  environment  form  the 
focus  of  this  paper.  In  recent  years,  hard  real-time,  soft 
real-time  and  embedded  systems  are  increasingly 
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important  in  various  application  areas  from  e-business  to 
military  applications.  These  systems  have  strict 
requirements  on  accuracy,  safety  and  reliability.  Usually 
such  software  is  large  and  built  on  several  legacy  systems 
to  make  use  of  the  partial  or  fill  functionalities  of  these 
legacy  systems.  When  the  legacy  systems  are  physically 
located  in  a  distributed  network,  they  are  connected 
through  certain  network  protocols.  Fast  prototyping  of 
these  systems  helps  the  users  in  analysis,  design, 
implementation,  verification,  validation  and  optimization. 
Approaches  for  modeling,  realizing,  reconfiguring  and 
allocating  logical  processes  and  interactions  to  processors 
and  communication  links  are  needed  to  make  prototyping 
useful  in  this  domain. 

This  paper  describes  a  distributed  CAPS  system  (DCAPS) 
to  fulfill  the  requirements  for  distributed  software 
prototyping.  Prototype  System  Description  Language 
(PSDL),  a  prototyping  language,  is  applied  in  the 
description  of  the  real-time  software  in  DCAPS  system. 
PSDL  provides  the  specifications  not  only  for  real-time 
constraints,  but  also  for  the  connection  and  interaction 
among  software  components.  PSDL  has  open  syntax  for 
the  design  of  new  features  that  arise  in  the  context  of 
distributed  computing.  Wrapper  and  glue  technology  is 
applied  for  the  normalization  and  data  transfer  of  legacy 
systems.  A  multi-agent  technique  is  used  to  manage  the 
execution  process. 

Section  2  introduces  the  three-module  architecture  of 
DCAPS  system.  All  the  modules  are  described  in  detail  in 
Section  3,  4  and  5  separately.  Sbction  6  gives  a  simple 
example  prototype  in  DCAPS. 

2.  System  architecture 

Earlier  work  on  computer-aided  prototyping  system 
(CAPS)  uses  PSDL,  a  prototype  description  language,  to 
describe  the  real-time  software  [4].  PSDL  itself  has  an 
open  structure  so  that  the  user  is  able  to  define  new 
properties  for  software  components,  such  as  new-added 
network  configurations.  CAPS  prototypes  a  software 
system  in  the  following  steps.  First,  user  selects  the 
software  components  from  the  reusable  component 
libraries  t>  construct  the  prototype  in  a  graphic  editor. 
This  prototype  is  saved  as  a  plain  text  file  in  PSDL  format. 
User  may  also  use  the  graphic  user  interface  (GUI) 
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generator  provided  by  CAPS  to  create  the  new  GUI 
interface  for  the  prototype.  Then,  the  translator  and 
scheduler  work  on  this  PSDL  file  to  generate  the 
wrapper/glue  code  and  dynamic/static  schedules 
respectively.  Both  the  source  code  of  reusable  components 
and  automatic  generated  source  code  will  be  compiled 
together  to  get  the  executable  final  software.  It  will  be 
tested  in  CAPS  (simulation)  for  both  the  execution 
correctness  and  the  real-time  requirements. 

As  described  above,  CAPS  consists  of  various  prototyping 
tools  to  provide  all  these  functionalities.  They  play 
different  roles  during  the  prototyping  process.  For 
example,  the  scheduler  just  needs  the  information  of 
timing  constraints  for  every  component,  while  the 
translator  does  not  care  about  such  information  other  than 
the  network  configurations  and  data  type  definitions. 
When  new  properties  are  enabled  in  PSDL  description  of 
the  prototype,  for  instance  to  prototype  a  networked 
software,  some  tools  must  be  updated  by  new  generations 
while  the  rest  stay  the  same.  Therefore,  the  architecture  of 
CAPS  must  consider  the  evolution  of  its  own  components. 

CAPS  tools  were  originally  developed  in  SunOS  operating 
system  for  components  which  are  located  on  one 
processor.  To  consider  the  user’s  requirement,  the  user 
interface  is  required  to  migrate  to  Windows  NT  operating 
system.  At  the  same  time,  the  old  operating  system  is  not 
supported  by  some  new  technologies.  To  avoid  the 
complexity  of  migrating  the  whole  system  to  a  new 
operating  system,  CAPS  now  has  to  work  in  a  distributed 
and  heterogeneous  environment.  A  new  architecture 
becomes  important  for  the  system.  On  the  other  hand, 
CAPS  is  required  to  prototype  software  systems  in 
distributed  and  heterogeneous  environments.  The 
requirements  to  develop  the  distributed  CAPS  (DCAPS) 
are  consistent  for  constructing  the  distributed  software 
prototypes,  i.e.,  DCAPS  itself  is  a  demonstration  of 
distributed  software  construction.  A  three-module 
architecture  is  proposed  to  design  the  distributed  CAPS 
system  (DCAPS). 


From  the  viewpoint  of  prototyping  procedure,  DCAPS  can 
group  its  tools  into  three  basic  modules  (Figure  1). 


Figure  1.  Three-module  architecture  design  of  DCAPS 


In  this  architecture,  DCAPS  provides  users  support  from 
three  aspects.  Databases  help  users  to  manage  and  reuse 
the  prototyping  requirements  and  reusable  software 
components.  It  also  validates  the  prototypes  for 
components’  evolution.  Prototyping  tools  help  user  in 
automatically  generating  connection  code,  GUI  code,  and 
data  type  conversion  code  among  components  during  the 
design  process.  Execution  manager  controls  and  visualizes 
the  simulation  process  to  validate  the  system  design, 
particularly  on  real-time  constraints. 

DCAPS  inherits  prototyping  tools  that  were  implemented 
in  different  operating  systems  including  SunOS,  Solaris 
and  Windows  NT.  It  provides  different  user  interfaces  for 
multiple  operating  systems  including  Windows  NT.  All 
the  tools,  which  are  in  the  three  modules,  are  located  in  a 
distributed  environment  during  one  prototyping  job. 

3.  Supporting  databases 

Supporting  databases  provide  intelligent  guidance  to  users 
so  that  in  a  form  of  adaptive  control  it  is  integrated  into  the 
system  prototyping.  There  are  two  types  of  database 
support  involved  in  DCAPS  system.  One  is  the  software 
reuse  database.  It  contains  the  specifications  for  all  the 
reusable  software  components  so  that  they  are  able  to  be 
retrieved  and  to  be  accessed  during  the  prototyping 
procedure  and  the  execution  (simulation).  Software 
version  control  should  also  be  considered  within  this 
database  support.  The  other  is  the  requirement  database. 

It  allows  users  to  reuse  the  previous  prototypes  that  are 
stored  in  the  database.  Thus  it  may  shorten  the  design 
cycle  and  even  optimize  the  design.  The  decomposition  of 
this  module  is  shown  in  Figure  2. 


Figure  2.  Supporting  database  system 


The  browse  and  retrieve  operations  for  the  database 
includes  both  syntactic  exclusion  and  semantic  exclusion 
to  narrow  the  search  range 

4.  Prototyping  tools 

Prototyping  tools  module  is  decomposed  as  follows 
(Figure  3).  It  includes  GUI  for  various  operating  systems , 
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which  includes  a  PSDL  graphic  editor,  the  prototype 
scheduler  J,  the  prototype  translator  (automatic  code 
generator  for  data  communication  among  components), 
source  code  compilers  and  code  optimizers  for  various’ 
languages  and  operating  systems.  The  major  operating 
systems  considered  in  DCAPS  are  SunOS,  Solaris  and 
Windows  NT.  Job  Dispatcher  works  on  a  server  platform 
to  receive  user  s  commands  from  GUI  and  to  dispatch  jobs 
to  correspondent  tools. 

The  compiler  in  different  operating  systems  just  needs  to 
work  with  the  correspondent  automatically  generated  code. 
With  the  change  of  language  in  a  specific  operating 
system,  it  is  not  necessary  to  change  the  other  components 
ofDCAPS. 


Figure  3.  Decomposition  of  System  Tools 


The  DCAPS  GUI  can  be  further  decomposed  as  in  Figure 


Figure  4.  Decomposition  ofDCAPS  GUI 


The  graphic  PSDL  editor  should  be  enhanced  for  new- 
added  properties  in  the  PSDL  description  of  prototype, 
such  as  network  configuration,  different  timing  constraint, 
etc.  Even  in  such  cases,  the  system  architecture  does  not 
have  to  change  at  allexcept  that  the  respective  modules  are 
replaced. 


The  different  tools,  which  are  located  in  different 
computers,  communicate  with  each  other  through  TCP/IP 
protocol.  The  wrapper/glue  technique  is  applied. 
However,  because  the  data  types  in  communication  are 
known  to  each  other,  the  wrappers  among  different  tools 
are  blank  to  each  other. 

5.  Execution  manager 

The  execution  of  the  distributed  system,  i.e.,  the  simulation 
of  the  prototype,  is  managed  by  the  Execution  Manager.  It 
uses  a  virtual  centralized  synchronization  timer  for 
different  task  schedules  in  different  processors.  This 
subsystem  must  compensate  for  clock  drift  due  to 
differences  in  clock  rates  without  violating  global  timing 
constraints  as  long  as  clock  drift  rates  remain  within 
specified  bounds.  A  multi-agent  system  is  used  in  the 
distributed  work  to  coordinate  the  computing  processes. 

The  Prototyping  Scheduler  generates  one  specific  task 
schedule  (both  dynamic  and  static)  for  each  node. 
Execution  Manager  provides  a  centralized  Executor  to 
administrate  and  to  synchronize  the  processes  in  different 
platforms  on  which  reusable  components  are  located 
(Figure  5).  The  procedure  of  execution  is  also  sent  back  to 
GUI  of  DCAPS  so  that  the  user  may  see  a  visualized 
process  and  have  clear  information  on  the  prototype. 


Legend:  ^  local  timing  agents 


Figure  5.  Execution  model  for  a  distributed  system 

In  each  node,  for  all  the  legacy  components,  the 
wrapper/glue  technology  is  applied  in  data  interchange 
(Figure  6).  A  form  of  software  wrapper  and  glue 
technology  provides  standardized  interactions  between 
legacy  systems  in  a  heterogeneous  network  in  DCAPS.  It 
makes  interoperability  and  integration  possible  for  a 
distributed  structure.  Legacy  systems  under  the  wrappers 
collaborate  through  the  message  passing  approach  in  the 
glue  connection.  Wrappers  provide  a  generic  interface  for 
every  single  legacy  system  so  that  its  input  and  output 
become  uniform,  both  for  consuming  data  from  other 
legacy  systems  and  for  generating  data  to  others.  On  the 
other  hand,  glue  structure  supports  an  abstract  data  class 


96 


for  data  transfer.  It  encodes  any  type  of  data  to  a  common 
type  before  putting  it  into  a  data  stream  at  the  sender’s  end. 
At  the  receiver’s  end,  the  data  is  decoded  to  the  required 
data  type  that  may  be  different  from  that  at  the  sending 
end.  Wrapper  and  glue  concepts  are  the  basis  of  a  formal 
model  for  software  and  hardware  co-design. 

A  multiple-agent  system  is  generated  automatically  by  the 
Prototyping  Translator  tool  in  the  architecture  as  the 
“glue”  for  the  network  communication  of  the  legacy 
system’s  inputs  and  outputs.  For  each  input/output  data 
flow,  an  agent  is  associated  as  an  automatic  pipe  of  data 
transmission.  It  makes  use  of  the  run-time  library  of 
network  communication  according  to  the  specific  network 
protocol  in  the  node  that  is  provided  in  component 
information.  This  “glue”  allows  the  legacy  systems  not  to 
worry  about  the  network  settings  for  the  communication  to 
other  components.  The  communication  among  agents  can 
reference  to  several  available  techniques  such  as 
JavaSpace,  Jini  etc.  The  technology  used  in  real 
application  should  be  selected  according  to  the  real 
network  configuration. 

The  “wrapper”  code  works  with  the  component  for  data 
type  control/conversion,  firing  condition,  exception 
handling,  timing  constraints,  etc.  The  “wrapper”  is  simply 
composed  in  several  different  layers  so  that  all  the  features 
that  user  concerns  are  tunable  according  to  user’s 
selections.  The  “wrapper”  communicates  to  the  agents  for 
data  outgoing  and  incoming.  Under  certain  specific 
conditions,  some  layer  of  the  wrapper  may  become 
transparent  based  on  enhanced  information.  For  example, 
in  the  design  of  DCAPS,  the  input/output  of  different 
prototyping  tools  are  standardized  in  advance.  Therefore, 
the  data  type  conversion  is  not  required.  Because  DCAPS 
itself  does  not  have  real-time  constraint,  the  wrapper  for 
timing  constraints  is  transparent. 


For  each  processor,  a  local  timing  agent  manages  the 
execution  tasks  under  the  schedule.  I/O  data  of  each 
component  is  received/sent  between  legacy  system  and  the 
uniform  software  wrapper,  which  is  automatically 
generated  and  transferred  through  glue  agents  generated  by 
glue  code,  which  hides  the  specific  network  configurations 
via  derived  design  and  network  mode/parameters. 


6.  Prototyping  example 


The  system  of  a  weather  station  is  prototyped  in  DCAPS  to 
demonstrate  the  ability  of  prototyping  the  distributed 
software  in  heterogeneous  operating  system. 


.-WK  Vv*lj!&Uy;e5 


Figure  7.  Top  level  of  weather-station  prototype 


Figure  8.  Decomposition  of  sys_b 


Figure  6.  Wrapper/glue  architecture  for  one  component 
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Figure  9.  Decomposition  of  sys_a 


Figure  10.  Properties  configuration  for  components 


As  shown  in  Figure  7-9,  weather  station  system  consists  of 
two  parts:  sys_b  is  the  sensor  and  sys_a  is  the  controller. 
The  sensor  system  includes  two  sub-sensors  which  are 
wind  direction  sensor  and  temperature  sensor.  The 
measurements  are  converted  in  specified  units.  It  reports 
the  measurement  results  to  the  controller.  The  controller 
sends  control  signal  of  signal  unit  to  the  sensor  system  so 
that  the  sensor  can  be  configured  automatically.  Both  the 
sub-systems  have  their  own  user  interfaces  in  the  local 
systems. 


The  two  sub-systems  are  located  in  different  computers. 
They  are  connected  through  network  in  TCP/IP  protocol. 
A  SOCKET  communication  run-time  library  is  provided 
for  data  interchange. 

DCAPS  provides  the  graphic  user  interface  to  edit  the 
prototype  in  multi-level.  For  each  component,  it  provides 
an  interface  (Figure  10)  so  that  user  may  specify  properties 
such  as  timing  constraints,  network  configuration,  data 
flow  type,  etc.  PSDL  editor  also  supports  a  GUI  code 
generator  so  that  user  can  create  a  personal-style  user 
interface  for  the  prototype. 

7.  Conclusions 

The  DCAPS  system  provides  a  useful  tool  for  distributed 
real-time  software  fast  prototyping.  A  three-module 
architecture  is  proposed  to  make  DCAPS  system  suitable 
for  distributed  environment.  The  wrapper/glue  method 
used  in  DCAPS  can  be  generalized  to  system  construction 
and  interconnection  of  legacy  systems.  By  automatically 
generating  the  codes  for  the  “wrappers  and  glue”  and 
providing  a  powerful  environment,  DCAPS  allows  the 
designers  to  concentrate  on  the  difficult  interoperability 
problems  and  issues,  freeing  them  from  implementation 
details.  It  also  enables  easy  reconfiguration  of  software 
and  network  properties  to  explore  design  alternatives. 
DCAPS  is  an  on-going  research  project  for  the 
development  and  refinement  of  its  prototyping  tools. 
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Abstract 

Speech  technology  has  been  moving  ever  increasingly 
into  the  domain  of  the  everyday  computer  user.  Computer 
users  would  use  speech  technology  more  readily  if  they 
could  speak  to  the  machine  like  they  could  talk  to  another 
person.  With  advances  in  visual  agent  and  natural 
language  technologies,  this  concept  is  already  a 
possibility.  In  this  paper,  we  present  some  ideas  about  a 
framework  of  a  type  of  user  interface  agent  known  as  a 
natural  language  agent  which  combines  spoken  language 
understanding  and  visual  agent  technologies  into  a  simple 
to  use  computer  interface.  Preliminary  results  of  two 
experimental  agents  based  on  the  framework  are 
discussed.  Future  work  on  creating  complete  natural 
language  agent  systems  is  also  included. 

Keywords:  natural  language  agents,  visual  agents,  speech 
technologies. 

1  Introduction 

If  you  could  decide  how  you  wanted  to  communicate 
with  your  computer,  would  you  really  pick  a  keyboard 
and  a  mouse  as  the  best  way?  Instead,  what  if  we  could 
communicate  with  our  computer  just  like  we  do  with 
people?  We  have  been  trained  for  many  years  to  use  the 
artifacts  of  keyboard  and  mouse  to  interface  with  our 
computers;  but  that’s  the  whole  point,  we’ve  been  trained 
to  use  them.  Instead,  we  should  be  creating  computer 
interfaces  that  adapt  to  the  way  people  communicate  with 
each  other.  In  this  area,  we  are  on  the  cusp  of  a  new  age 
in  human-computer  interaction.  The  technologies 
necessary  to  support  human-like  communication  with  a 
computer  are  slowly  coming  of  age;  and  when  they  do, 
everyone  will  be  able  to  easily  use  a  computer. 

The  next  generation  of  human-computer  interaction 
will  allow  the  user  to  interact  with  a  computer  system 
using  the  language  they  speak  to  others  with  every  day 
and  they  get  to  choose  how  the  computer  will  represent 
itself  to  them.  Getting  a  system  to  use  spoken  language  as 
an  interface  is  just  one  piece  of  the  puzzle.  In  order  to 
effectively  communicate,  most  human  beings  require  a 
visual  representation  of  who  or  what  they  are  speaking  to 
in  order  to  feel  comfortable  with  this  means  of 


communication.  Visual  agents  are  a  natural  fit  for  this 
responsibility.  By  creating  a  visual  avatar  for  the 
computer  to  use  as  the  interface,  the  user  feels  more 
comfortable  with  the  interaction  because  now  they  are 
talking  to  somebody.  Additionally,  a  visual  avatar  can  use 
such  techniques  as  body  language  and  other  body 
movements  to  communicate  on  another  level  with  the 
just  like  human  beings  do  [1],  Thus,  natural 
language  agents  (NLA)  refer  to  a  type  of  user  interface 
agent  that  combines  spoken  language  understanding  and 
visual  agent  technologies  to  create  a  simple  to  use 
computer  interface. 

This  paper  proposes  a  framework  for  NLA  in  the 
Microsoft  (MS)  Windows  environment  and  discusses 
some  preliminary  results.  Section  2  covers  the  state  of  the 
component  technologies  for  NLA  as  they  stand  today. 
Section  3  describes  the  proposed  natural  language  agent 
framework  called  Secret  Agent.  Some  preliminary  results 
based  on  the  Secret  Agent  framework  are  given  in  Section 
4.  Finally,  Section  5  concludes  the  paper  with  some 
remarks  on  future  work. 

2  Natural  Language  Agents 

We  have  been  exposed  to  natural  language  agents  of 
all  types  through  TV  and  movies  over  the  years. 
However,  there  are  many  advances  in  fields  other  than 
computer  science,  which  are  necessary  to  support  that 
level  of  technology.  In  the  meantime,  natural  language 
agent  computer  interfaces  can  be  created  using 
technology  available  today  that  will  allow  an  ordinary 
person  to  communicate  with  their  computer  just  like  it  is 
another  person.  Ultimately,  the  agent  could  become  the 
user’s  everyday  friend  and  helper. 

The  components  necessary  to  create  a  basic  natural 
language  agent  include  1)  a  user-selected  visual  agent 
representation,  2)  speech  recognition,  3)  speech  synthesis, 
4)  natural  language  understanding,  5)  an  interface  to  the 
system  the  agent  is  designed  to  help  with,  and  6)  some 
additional  utility  functions.  The  visual  agent  gives  the 
user  a  visual  persona  to  which  they  can  speak  with  during 
their  interaction  in  order  to  increase  the  comfort  of 
communicating  with  a  computer.  By  giving  the  user 
control  over  the  visual  representation  of  the  agent,  the 
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user  can  customize  and  select  the  representation  that  is 
most  entertaining  or  interesting  to  work  with.  Next,  the 
speech  recognition  component  allows  the  agent  to 
translate  the  physical  speech  utterances  of  the  user  into 
meaningful  words  in  the  user’s  language.  Also,  the 
speech  synthesis  allows  the  agent  to  speak  back  to  the 
user  for  complete  spoken  interaction.  Then,  the  natural 
language  understanding  component  is  necessary  in  order 
to  translate  the  words  that  a  user  speaks  into  ideas  and 
concepts,  so  that  the  user  can  make  meaningful  requests 
or  have  a  conversation  with  the  agent.  Finally,  an 
interface  to  the  system,  which  the  agent  is  helping  with, 
allows  the  agent  to  enact  the  requests  that  the  user  might 
make  during  a  session  with  the  agent.  Additionally,  more 
components  can  be  added  to  the  agent  to  increase  its 
functionality  and  usefulness,  including  long-term 
memory,  adaptation  of  conversation  to  user  preferences 
and  work  habits,  conversational  capabilities,  and  others. 
For  some  of  the  technologies  that  were  just  discussed, 
there  are  a  number  of  options  available  for  Microsoft 
Windows-based  components,  such  as: 

•  Visual  Agent  -  MS  Agent  [2],  CSLU  Baldi  [3] 

•  Speech  Recognition/Synthesis  -  MS  Speech 
SDK  [4]  supporting  IBM,  Dragon,  MS,  and 
Lemout  and  Hauspie  speech  engines 

For  natural  language  understanding,  the  field  is  still  in 
the  research  phase  (see  MIT  [5]  and  CMU  [6])  though 
some  expensive  commercial  work  is  being  done  today  by 
Cycorp  [7].  The  remainder  of  the  natural  language  agent 
components  will  need  to  be  custom  built  until  natural 
language  agent  technology  becomes  more  common. 

3  Secret  Agent  Framework 

In  general,  an  NLA  has  the  structure  shown  in  Figure 
1.  It  includes  all  the  components  discussed  in  Section  2 
and  interfaces  with  both  the  application  and  operating 
system.  The  Secret  Agent  framework  (SAF)  is  designed 
to  encapsulate  the  visual  agent  and  speech  technologies 
that  are  necessary  for  any  NLA  application. 


Figure  1.  Generic  NLA  framework. 


The  goal  of  the  SAF  is  to  base  the  framework  on  the 
most  publicly  accessible  and  standardized  components 
that  could  be  found  for  MS  Windows.  Since  MS  Agent 
and  the  MS  Speech  API  are  the  de  facto  standards  for 
visual  agents  and  speech  in  MS  Windows,  they  are  chosen 
for  the  SAF.  Figure  2  shows  the  structure  of  the  modules 
in  the  SAF.  A  separate  speech  synthesis  module  is  not 
needed  in  this  case,  because  it  is  incorporated  into  MS 
Agent.  Since  the  SAF  incorporates  visual  agent  and 
speech  technologies,  it  can  be  used  in  any  number  of 
applications,  such  as  tutoring  and  personal  assistant 
applications,  that  require  these  technologies. 


Figure  2.  Secret  Agent  Framework  (SAF). 

Since  both  of  the  visual  agent  and  speech  technologies 
support  the  MS  Component  Object  Model  (COM)  [8],  the 
SAF  is  implemented  using  C++  and  direct  COM 
interfaces  for  maximum  flexibility  in  control  of  the  COM 
objects  provided  by  the  technologies. 

Additionally,  the  SAF  provides  a  number  of  user 
configurable  options  that  are  accessible  via  a  dialog  built 
into  the  framework.  Using  these  options,  the  user  has  full 
control  over  the  agent  visual  representation,  speech 
recognition  engine  and  speech  synthesis  voice  used  for 
the  agent.  Also,  an  optional  speech  window  allows  the 
user  to  see  what  the  agent  has  heard  so  the  user  knows 
when  the  speech  engine  needs  to  be  trained. 

4  Some  Example  Agents 

The  first  SAF-based  NLA  is  based  an  old  BBS  door 
program  called  Eliza.  Joseph  Weizenbaum  originally 
created  Eliza  as  a  challenge  to  the  Turing  test.  Since  Eliza 
is  based  on  the  Rogerian  mode  of  therapy  in  which  the 
therapist  strives  to  eliminate  all  traces  of  his  of  her 
personality  from  the  dialog,  Weizenbaum  had  planned  to 
show  that  the  test  could  be  beat  through  the  use  of  ‘tricks’ 
instead  of  true  ‘intelligence’  [9].  A  conversation  engine  is 
built  into  the  NLA  that  would  mimic  the  functionality  of 
the  original  Eliza  application.  The  result  is  a  natural 
language  agent  that  responds  to  everything  the  user  says. 
Depending  on  the  complexity  and  topic  of  the 
conversation,  the  agent  can  maintain  the  illusion  of 
conversationally  competence  anywhere  from  2  responses 
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to  an  entire  conversation  just  like  the  original  Eliza.  This 
NLA  uses  the  same  algorithms  for  the  conversation 
engine  as  the  original  Eliza  application,  but  adds  the  extra 
levels  of  visual  agent  and  speech  technologies. 

The  Eliza  conversation  engine  is  written  in  C++  as  an 
object,  which  could  communicate  with  the  Secret  Agent 
framework.  With  the  response  per  utterance  mode  that 
Eliza  works  in,  the  objects  are  easily  integrated.  The  only 
other  key  feature  of  interest  is  that  Eliza  uses  a  word 
matching  heuristic  to  approximate  conversational 
competency.  Every  word  pattern  handled  by  the 
conversation  engine  is  associated  with  a  standard 
response  that  may  employ  words  in  the  user’s  original 
utterance.  If  words  from  the  user’s  utterance  are  used,  the 
words  involved  are  conjugated  and  transposed  to  make 
the  response  fit  the  user’s  input.  This  is  what  gives  the 
user  the  perception  of  speaking  to  a  psychiatrist.  The 
word  patterns  and  responses  are  stored  in  a  configuration 
file  that  is  loaded  by  the  application  at  startup,  and  can  be 
easily  modified  and  expanded. 


Conversation 

Engine 


Figure  3.  Conversational  NLA. 

The  second  SAF-based  NLA  stems  from  the  fact  that 
users  will  want  a  natural  language  agent  to  help  control 
the  applications  that  they  use  everyday.  In  this  respect,  we 
choose  a  web  browser  as  the  application  of  choice.  Two 
factors  are  behind  the  selection  of  a  web  browser 
application:  the  high  demand  for  web  centric  applications 
in  today’s  market,  and  the  availability  of  a  web  browser 
application  interface.  Using  the  SAF  as  the  basis, 
command-understanding  capability,  an  interface  to  a  web 
browser  and  some  limited  OS  interaction  are  added.  To 
approximate  natural  language  commands,  the  following 
methods  are  prototyped:  continuous  dictation,  frame- 
based  grammar  and  a  standard  grammar.  A  standard 
grammar  is  created  for  this  NLA  due  to  the  simplicity  of 
creation  and  lack  of  ambiguity  of  speech  during  use.  It  is 
also  a  standard  natural  language  approximation  technique 
used  by  most  modem  speech  recognition  applications. 

The  technical  work  on  the  web  browser  agent  is  quite  a 
bit  more  complicated  due  to  the  interface  with  an 
independent  commercial  application.  Microsoft  Internet 
Explorer  is  chosen  as  the  web  browser  in  the  experiment, 
since  MS  provides  classes  that  encapsulate  a 
programmable  interface  to  the  browser  using  COM 
technology  [10], 


Most  of  the  functions  in  the  programmable  interface 
are  enabled  in  this  NLA.  These  functions  include:  simple 
navigation  commands  (back,  home,  forward,  etc.), 
scrolling  capability  and  application  control  (toolbars, 
modes).  Expansion  is  made  to  the  functionality  by 
allowing  the  user  to  navigate  hyperlinks  on  a  page 
through  spoken  commands.  There  are  two  parts  to  this 
feature.  For  text  links,  a  routine  is  called  after  a  page  is 
loaded  to  dynamically  update  the  grammar  used  by  the 
speech  recognition  engine.  For  other  links  (such  as 
pictures),  another  routine  intercepts  the  incoming  HTML 
page  and  adds  numbers  to  each  of  the  hyperlinks  on  the 
page,  which  can  then  be  spoken  to  navigate  to  those  links. 
Additional  application  capability  is  added  to  allow  the 
user  to  verbally  select  buttons  on  dialog  boxes  that  might 
come  up  during  a  typical  web  browsing  session.  Finally,  a 
simple  help  section  is  added  which  outlines  how  the  agent 
works  as  well  as  a  list  of  supported  commands.  All  the 
help  and  command  information  is  stored  in  text  files 
which  can  be  modified  and  expanded.  A  detailed 
discussion  on  these  two  experiments  can  be  found  in  [1 1], 


Figure  4.  Web  browser  NLA. 


5  Conclusion 

The  two  example  agents  are  just  the  tip  of  the  iceberg 
of  what  can  be  done  with  the  SAF  and  other  technology 
available  today.  The  conversation  agent  could  be 
programmed  with  a  better  natural  language  paradigm  to 
allow  it  to  interact  in  a  more  realistic  way  with  the  user. 
The  web  browser  agent  could  be  enhanced  by  creating 
interfaces  to  more  applications  (e-mail,  word  processing, 
etc.).  Both  of  these  experiments  represent  only  two  facets 
of  the  ultimate  goal  for  NLA:  to  create  conversationally 
competent  NLA  that  can  be  used  as  the  complete  interface 
to  a  system. 

In  order  to  achieve  the  goal  of  NLA,  there  are  a 
number  of  things  that  need  to  happen.  First,  natural 
language  understanding  engines  need  to  be  created  which 
can  succeed  in  the  domains  that  users  want  to  use  natural 
language  technology.  Next,  any  application  that  would 
like  to  interface  with  a  natural  language  agent  needs  to 
provide  an  interface  through  which  the  agent  can  control 
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the  application.  Finally,  the  agent  needs  to  be  able  to 
make  the  user  feel  comfortable  by  being  able  to  learn  and 
act  like  another  person.  Once  these  steps  are  achieved  and 
integrated,  NLA  agents  will  become  common  fare. 

So  what’s  next?  In  the  near  future,  NLA  or 
comparable  technology  will  be  a  standard  OS  component 
in  the  consumer  computing  marketplace.  Already, 
products  such  as  IBM  ViaVoice  Millennium  [12]  are 
filling  the  void  by  creating  the  first  commercial  versions 
ofNLA. 
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Abstract 

This  paper  presents  a  case  study  of  implementing  a  large  dis¬ 
tributed  system  in  Scheme.  Metcast  is  a  request-reply  and 
subscription  system  for  dissemination  of  real-time  weather 
information.  The  system  stores  a  large  amount  of  weather 
observation  reports,  forecasts,  gridded  data  produced  by 
weather  models,  and  satellite  imagery.  A  Metcast  server 
delivers  a  subset  of  these  data  in  response  to  a  querv  for¬ 
mulated  in  a  domain-specific  language.  Decoders  of  World 
Meteorological  Organization's  data  feed,  the  Metcast  server, 
XML  encoders  and  decoders,  auxiliary  and  monitoring  CGI 
scripts  are  all  written  in  Scheme. 

This  paper  considers  two  examples  that  demonstrate  ben¬ 
efits  of  our  choice  of  the  implementation  language:  parsing 
of  the  data  feed  and  a  module  system  for  the  Metcast  server! 

Ue  wi!I  also  discuss  extensions  to  Scheme  as  well  as  perfor¬ 
mance. 


1  Overview  of  Metcast 

Metcast  is  a  request-reply  and  a  subscription  system  for 
distributing,  disseminating,  publishing  and  broadcasting  of 
real-time  weather  information  [1].  Tiie  system  comprises 
clients  and  servers  communicating  in  an  HTTP  protocol. 
A  Metcast  server  maintains  a  database  of  weather  observa¬ 
tion  reports,  forecasts,  advisories,  gridded  data  produced  bv 
weather  models,  as  well  as  of  satellite  imagery  and  plain  text 
messages  and  discussions.  A  Metcast  client  uses  a  web  form 
or  a  domain-specific,  flexible  request  language  to  retrieve 
a  subset  of  data  from  a  Metcast  database  [2J.  A  Metcast 
server  —  which  is  an  application  (web)  server  —  parses  re¬ 
quests,  queries  the  database  and  sends  the  requested  data 
in  a  single-  or  a  multi-part  reply.  A  server  may  act  as  a 
client  to  request  a  subset  of  data  for  further  redistribution. 
Metcast  servers  are  in  operation  on  several  U.S.  Navy  Me¬ 
teorology  and  Oceanography  centers  worldwide.  Clients  are 
deployed  on  great  many  sites  throughout  the  U.S.  Navy  as 
well  as  U.S.  Air  Force,  DoD.  NATO,  NO  A  A  and  other  gov¬ 
ernment  agencies. 

One  particular  source  of  original  data  is  World  Mete¬ 
orological  Organization’s  (WMO)  data  feed,  containing  a 
great  number  of  land  and  sea  surface  and  depth/height  pro¬ 
file  reports,  forecasts,  advisories,  discussions,  etc.  -  for  the 
whole  globe.  A  set  of  decoders  processes  the  feed,  and  stores 
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raw  and  decoded  data  in  a  database.  A  Metcast  server  dis¬ 
tributes  this  information  in  an  XML  OMF  format  [3J. 

The  Metcast  server,  the  set  of  decoders  for  various  WMO 
data  formats,  auxiliary  and  monitoring  CGI  scripts  are  all 
written  in  Scheme.  Metcast  clients  are  written  in  C+-f, 
Java,  Scheme,  Perl,  Python,  JavaScript,  and  Visual  Basic. 

The  server  and  related  modules  are  implemented  in  12800 
lines  of  Scheme  code,  counting  the  comments.  WMO  data 
feed  decoders  add  8400  more  lines.  The  size  of  common 
extension  libraries  is  5400  lines  of  Scheme  and  some  embed¬ 
ded  C  code.  A  Gambit-C  3.0  Scheme  interpreter  enhanced 
with  compiled-in  extensions  has  been  used  throughout  the 
project. 

2  Parsing  of  the  data  feed 

Scheme  proved  to  be  particularly  helpful  in  parsing  of  the 
\\  MO  data  feed.  WMO  code  is  a  rather  old,  ad  hoc,  pe¬ 
culiar,  somew'hat  inconsistent,  tangled  data  format  with  a 
number  of  options,  exceptions  and  special  cases.  Further¬ 
more,  received  bulletins  often  contain  errors  due  to  manual 
miscoding  and  transmission  problems. 

A  typical  WMO  report  -  for  example,  a  surface  synoptic 
report  -  is  a  sequence  of  code  groups  separated  by  white 
space.  A  code  group  is  a  string  of  letters,  numbers  and 
a  few*  special  characters.  A  code  group  or  groups  encode 
the  result  of  observation  of  a  particular  quantity,  e.g.,  cloud 
conditions,  temperature,  etc.  If  code  groups  were  atomic 
tokens,  a  report  could  easily  be  parsed  by  a  LR(1)  automa¬ 
ton.  Alas,  code  groups  are  composite  entities  that  encode 
information  in  idiosyncratic  ways.  The  mere  identification 
of  a  code  group  depends  on  its  position  and  context,  which 
may  encompass  all  previously  seen  code  groups. 

We  have  implemented  a  report  decoder  as  a  combination 
of  a  table-driven  automaton  and  code-based  group  parsers. 
The  latter  recognize,  parse,  and  validate  a  particular  code 
group.  The  decoder  takes  a  list  of  code  groups  and  returns 
an  associative  list,  an  Abstract  Syntax  "Tree"  (AST).  A  spe¬ 
cial  procedure  later  walks  the  AST  and  records  the  parsed 
data  in  a  database  upload  buffer.  Of  a  particular  help  w'as 
Scheme’s  ability  to  store  and  pass  procedural  values  as  any 
other  values.  This  let  us  implement  decoders  as  composi¬ 
tions  of  code  group  parsers.  For  example,  a  very  typical  pro¬ 
duction  <a>?  <b>*  <c>?  can  be  parsed  by  a  combination 

(sequence  parse-a  (sequence  (loop  parse-b)  parse-c)). 
This  composition  of  group  parsers  is  represented  by  a  list 
(parse-a  (repetition-flag  parse-b)  parse-c).  Given  this 
list  and  the  list  of  code  groups  to  decode,  a  main  driver  wralks 
both  lists,  applying  the  current  parser  to  the  current  code 
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i 

i  §r0UP-  The  result  of  the  application  as  well  as  the  repeti- 

|  tion  flag  determine  if  the  current  code  group  is  consumed,  if 

I  the  next  parser  should  be  chosen,  and  how  AST  should  be 
extended. 

All  the  group  parsers  have  the  same  interface.  They 
|  receive  as  arguments  the  current  code  group  and  the  AST, 

I  and  should  return: 

•  an  association  (a  name-value  pair)  or  a  list  of  such 

;  associations  to  add  to  the  AST; 

•  a  symbol  pass  if  the  parser  failed  to  recognize  the  code 
group.  The  code  group  should  be  given  to  the  next 
parser; 

•  meaning  a  syntax  error  is  detected  at  the  current 
token; 

•  a  symbol  terminate  to  stop  parsing  of  the  report. 

In  the  successful  case  (the  first  one  above),  the  current  token 
is  assumed  consumed.  Any  group  parser  may  examine  the 
AST  (that  is,  the  results  of  the  previous  parsers)  and  mav 
e^en  modify  the  AST.  Therefore  our  parsing  technique  is 
somewhat  similar  to  attribute  grammars.  Figure  1  shows 
an  example  of  a  group  parser. 

The  example  demonstrates  an  and-let*  construction  (srfi- 
2),  which  was  used  frequently  throughout  the  project  and 
proved  very  helpful.  As  Fig.  1  shows,  once  the  current  token 
has  been  recognized  as  a  potential  <temperature-dev-point> 
group,  and-let*  carries  on  a  sequence  of  elementary  parsing 
decisions,  all  of  which  must  succeed. 

The  Metcast  decoder  is  continually  processing  incoming 
files,  which  are  delivered  every  1-3  minutes.  A  rather  large 
batch  of  reports  -  8  plain-text  bulletins,  144  sea  surface  ob¬ 
servation  reports,  777  upper-air  level  data,  2  terminal  air¬ 
drome  forecasts  and  322  synoptic  reports  -  takes  8  wall-clock 
seconds  to  parse  and  19  seconds  to  upload  and  record  into 
the  database.  The  platform  is  Sun  Enterprise-450  server 
with  two  Ultras  PARC- ll  CPUS  and  512  mb  ram,  running  So¬ 
laris  2.6  and  Informix  /  .3  database.  Keeping  in  mind  that 
incoming  reports  have  up  to  10-minute  delay  from  the  time 
of  issue,  the  total  processing  time  at  the  Metcast  end  -  under 
1  minute  -  is  entirely  acceptable. 

3  implementing  the  Web  application  server 

Scheme  turned  out  to  be  a  good  implementation  language 
for  a  web  application  server  as  well.  One  part  of  the  server 
is  a  complex  finite  state  machine  that  decides  when  a  multi¬ 
part  reply  is  called  for,  and  sends  the  corresponding  mime 
headers.  The  problem  is  not  trivial  as  it  is  generally  im¬ 
possible  to  predict  the  number  of  non-empty  replies  for  a 
complex  request.  Expressing  such  finite  automata  as  sets 
of  mutually-recursive  procedures  made  the  code  clear  and 
flexible. 

Scheme  was  conducive  to  compilation  and  interpretation 
of  the  S-expression-based  Metcast  Request  Language  [2].  A 
request  language  phrase  is  compiled  into  a  dictionary  -  an 
ordered  sequence  of  bindings,  -  which  constitutes  the  en¬ 
vironment  to  look  up  all  data  needed  to  construct  a  Met¬ 
cast  database  query.  This  hierarchical  repository  follows 
neither  the  static  scope  of  Scheme  expressions,  nor  the  dy¬ 
namic  scope  of  procedure  activations.  Some  bindings  may 
be  to  procedures,  which  may  push  additional  associations 
into  the  environment  and  thus  afreet  further  lookups. 

Metcast  server  has  a  highly  modular  structure.  The  main 
program  is  responsible  for  receiving  and  parsing  of  a  request, 


and  packing  of  replies.  Execution  of  a  particular  product  re¬ 
quest  is  delegated  to  a  separate  module  (plug-in).  The  hier¬ 
archical  repository  was  indispensable  in  implementing  a  pa¬ 
rameter  bus,  which  maintains  the  configuration  for  the  main 
server  and  all  plug-ins.  The  parameter  bus  also  provides  a 
uniform  interface  for  invocation  of  modules  and  passing  of  a 
complex  set  of  explicit  and  default  parameters.  For  example, 
the  main  Metcast  server  module  contains  a  form  (include 
"metar.scm")  that  loads  a  plug-in  metar.scm.  The  latter  file 
defines  procedures  perform-metar-request  and  perform- 
MSL-request.  The  file  binds  these  procedures  to  the  cor¬ 
responding  Request  Language  verbs  and  the  configuration 
information: 

(envSbind* 

‘ ( (METAR  (executor  .  ,perf orm-metar-request) 
(mime-type  .  "text/x-omf ") ) 

(MSL  (executor  .  , perf orm-MSL-request) 
(mime-type  .  "text/x-msl") ) 

(03J-L0ADER: st_constraint  . 

, (lambda  constr-1 

(envSbind  st.constraint  constr-1)))  )) 

When  metar.scm  is  loaded,  the  above  initialization  expres¬ 
sion  is  evaluated.  The  Metcast  server  thus  gains  an  ability 
to  process  requests  for  METAR  and  MSL  products.  The  main 
server  module  contains  a  long  chain  of  (include  "xxx .  sem") 
expressions,  which  define  a  set  of  requests  a  server  accepts. 
Adding  or  replacing  support  for  a  particular  product  re¬ 
quests  is  as  simple  as  loading  or  reloading  the  corresponding 
plug-in.  This  re-configuration  and  linking-in  of  the  modules 
is  possible  while  the  server  is  running  -  although  we  have 
not  pursued  this  opportunity.  The  flexible  module  linking 
mechanism  was  beneficial  even  in  the  static  case  as  it  made 
incremental  development  and  evolution  of  the  server  easier. 

4  Extensions  to  Scheme 

Implementing  Metcast  required  several  extensions  of  the 
Gambit-C  Scheme  system:  libraries  of  common  procedures, 
and  interfaces  to  external  applications  and  the  OS.  Detailed 
descriptions  for  all  extensions  along  with  the  commented 
source  and  validation  code  are  freely  available  from  a  web 
site  [4j. 

We  have  already  mentioned  one  helpful  extension:  and-- 
let*,  an  and  with  local  bindings,  a  guarded  let*  special 
form.  An  input  parsing  library  was  another  extension.  It  is  a 
set  of  procedures  that  either  skip,  or  build  and  return  tokens 
following  inclusion  or  delimiting  semantics.  The  input  pars¬ 
ing  library  has  been  used  on  very  many  occasions:  in  split¬ 
ting  WMO  data  feed  files  into  bulletins  and  bulletins  into 
code  groups;  in  parsing  of  a  QUERY_STRING  or  HTML  form 
POST  submissions;  in  breaking  the  response  stream  from  a 
database  query  into  rows  and  columns  of  data;  in  parsing  of 
XML. 

Another  kind  of  extension  -  made  possible  by  Gambit’s 
excellent  Foreign  Function  Interface  -  deals  with  accessing 
processes,  files,  directories,  communication  pipes  and  other 
objects  external  to  a  Scheme  system.  Scanning  of  a  POSIX 
directory  is  implemented  in  a  truly  Scheme  style  and  spirit: 
The  OS:for-each-file-in-directory  iterator  combines  the  best 
features  of  for-each,  map,  and  filter,  and  permits  prema¬ 
ture  termination  of  iterations. 

A  very  helpful  extension  that  goes  far  beyond  Scheme 
is  opening  and  communicating  through  uni-,  bi-directional, 
and  TCP  pipes  as  if  they  were  regular  files.  This  exten¬ 
sion  allows  Scheme  code  to  talk  to  external  applications  or 
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;  <t%prat^rrnr  .::=  <te°p>  -/-  <dev-point>? 

(lambda  (token  AST)  •  V^must  be  ^”-point>  = '=  "M"?  <tu°-digits> 
(let  ((slas^pos  (string.index  P°s  2  or  3 

,  (memv  slash-pos  '(2  3)))  ’pass 

Cana-let*  * 

((negate  (lambda  (x)  (and  x  (-  x)))) 

(teapr  J 

(if  (char=?  #\M  (string-ref  token  0)) 

(negate  (string->integer  token  1  3)) 
lstrmg->integer  token  0  2))) 

(dp-pos  (++  slash-pos)) 

<dp  ‘“^45-^rssirs 

(cons  »T  tempr) 

(list  (cons  >T  tenpr)  (cons  ’DP  dp))))))) 


Figure  1:  A  <temperature-dev-point>  group  parser 


internet  services.  One  particular  kind  of  such  an  external 

to  buifd  anoortaW0TanKd'Iine  SQL  t0°!’  Which  alIowed  us 
„  •  ponablc  database' access  library  [4|  A  database 

a  e/ne  n|e-rfaCe  ‘S  lmpIemcntcd  a  Scheme  spirit  as  well  as 
genera!  iterator  over  a  collection  of  selected  rows. 

5  Illusory  and  real  difficulties 

bevUabfv^i^!flC|mCntati°n  lansuage  other  than  C  or  C-f- 
tneutablv  raises  the  question  of  performance.  We  have  r..n 

benchmarks  to  ascertain  the  total  performance  and 

Sri™  *  ror For  *  “”ri. 

25.0  «T(  «"n ^2j?”'fa8"(“>“li'?821K»ro«tp»t),ook 

24,1  sec  (uscr)  and  under  0.1  sec  of  thesvstem 
cion  Of  ,1”  wnnms  tlmfJ  comprises:  loading  and  incerpreta- 

£ 

inoffir:  ►  -^|1US|  le  datal>ase  interface  -  however  uely  and 
inefficient  it  looks  -  is  not  the  bottleneck  Parsin/of  th* 

?rcao“ndr2P2Vscc  Scheme  code  adds  3.8  sec 

(real)  and  2.2  sec  (user)  time.  That  is  noticeable  vet  in 

,h5  rtai  ?™  *b°- 

time  is  under  1  n  se('er  phoned  that  the  server  start-up 

-o  hi”  s„!p°,s.^s  jKJSiss  jss&t* 

proeess  -  launehing  „r  the  Cambit  ime)p?e.et ,  ™d“.  of 
of  “  SCnpt  an<j  15  included  scripts  totaJin^  12800  lines 
hLe°tvp«.C)?S'ri0"  “d  b>-t«ompitotIon  -  one"  S 
the  bottleneck  The  V°  b*  a  significant  fa«°r  if  not 

the  most  o  the  r  b‘SgeSt  SUrprise  was  the  fa«  ^at 

7  lines  of  TlS  ^  ~  20  SeCOnds  ~  was  spent  within 

anXr  whHe  Wh'ch.coP-v  characters  from  one  stream  to 
tion  rnn  unescapmg  newlines.  A  makeshift  optimi2a- 

chia'^er  anndStrnmS  ,in!;b>"line  rather  than  character-by- 
*«Ste-s-  bSr  utlIlzlng  Gambit’s  undocumented  function 

time  from  25  fi  ~  reduced  the  benchmark  real  running 
time  irorn  2o.6  sec  down  to  17.0  sec. 


6  Conclusions 

Implementation  of  a  web  application  server  and  WMO  de 
t*°lerS  ‘n  Scheme  showed  that  the  language  is  up  to  the 
task.  The  elegance  of  Scheme  and  its  ability  to  easily  express 
?“^rd?d  execution,  finite-state  machines  as  sets  of  mutual! v 
recursive  actions,  hierarchical  repositories  with  procedural 

rnH»^S  rned  OUt  CO  be  m0St  imP°rtant.  Built-in  garba-e 
rannnfT’  erat0fS’  safety-  the  ease  of  incremental  testing 

a^  so  fVT^T^  either'  D0Spite  °bvi0us  inafficien! 
sldsf^tn  KVe^  -Metcasc  server  performance  is  deemed 
satisfactory  by  customers. 
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SUMMARY 

We  suggest  that  empirical  studies  of  maintenance  are  difficult  to  understand  unless  the  context  of  the 
study  is  fully  defined.  We  developed  a  preliminary  ontology  to  identify  a  number  of  factors  that  influence 
maintenance.  The  purpose  of  the  ontology  is  to  identify  factors  that  would  affect  the  results  of  empirical 
studies.  We  present  the  ontology  in  the  form  of  a  UML  model.  Using  the  maintenance  factors  included  in 
the  ontology,  we  define  two  common  maintenance  scenarios  and  consider  the  industrial  issues  associated 
with  them.  Copyright  t  1999  John  Wiley  &  Sons,  Ltd. 

KEY  WORDS,  empirical  research:  maintenance  factors;  maintenance  scenarios:  evolutionary  maintenance;  independent 
maintenance  groups:  maintenance  ontology 


1.  INTRODUCTION 

This  paper  arose  from  a  discussion  session  held  at  the  3rd  Annual  Workshop  on  Empirical  Studies 
of  Software  Maintenance  (‘WESS  ’98’).  The  task  of  the  session  was  to  consider  the  question  ‘What 
are  the  differences  between  maintenance  tools/methods/skills  and  those  of  development?’  From  the 
point  at  which  members  of  the  group  stated  their  preliminary  positions,  it  was  evident  that  we  would 
find  it  difficult  to  give  a  single  answer.  The  position  statements  ranged  from  what  can  be  paraphrased 
as  ‘Nothing  much"  to  ‘Lots  of  stuff.’ 
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ST5  5BG,  U.K.  Email:  barbara@cs.keele.ac.uk 


CCC  1 040-550X/99/060365-25S 1 7.50 
Copyright  ©  1999  John  Wiley  &  Sons.  Ltd. 


Received  10  May  1999 
Revised  23  September  1999 


B.  A.  KITCHENHAM  ET AL 


As  the  discussion  continued,  it  became  clear  that  our  difficulties  arose  from  our  different  views 
of  what  constituted  maintenance  .  We  concluded  that  we  could  not  answer  any  serious  questions 
about  maintenance  methods,  tools  or  skills  until  we  had  a  description  of  maintenance  rich  enough 
to  encompass  all  our  different  experiences  of  maintenance.  We  concluded  that  what  we  needed  was 
an  ontology  of  maintenance  that  is.  a  specification  of  a  conceptualisation  (Gruber.  1995).  This 
ontolog}  should  not  be  only  a  hierarchy  of  terms,  but  a  framework  talking  about  the  maintenance 
domain  and  identifying  the  factors  that  affect  maintenance,  supported  bv  a  taxonomy  describing  the 
different  factor  levels. 

We  believe  that  such  an  ontology  would  have  four  major  benefits  for  the  maintenance  research 
community.  It  would: 

1 .  allow  researchers  to  provide  a  context  w  ithin  which  specific  questions  about  maintenance  can 
be  investigated: 

2.  help  to  understand  and  resolve  contradictory  results  observed  in  empirical  studies; 

3.  provide  a  standard  framework  to  assist  the  reporting  of  empirical  studies  in  a  manner  such 
that  they  can  be  classified,  understood  and  replicated:  and 

4.  provide  a  framework  for  categorising  empirical  studies  and  organising  them  into  a  body  of 
knowledge. 

Furthermore,  if  we  could  report  our  research  results  in  a  systematic  fashion,  clarifying  the  context 
to  which  the  results  apply,  it  would  also  help  industrial  adoption  of  research  results. 

In  Section  2.  we  present  an  overview  of  the  ontology.  In  Section  3  we  describe  our  proposed 
maintenance  ontology  in  more  detail.  In  Section  4.  we  look  at  two  maintenance  scenarios  and 
consider  how  the  ontology  can  be  used  to  help  characterise  the  difference  between  the  scenarios. 


2.  OVERVIEW 

de  Almeida,  de  Menezes  and  da  Rocha  ( 1998)  describe  the  process  of  constructing  an  ontology 
as  involving  the  following  activities: 

•  purpose  identification  and  requirement  specification: 

•  ontology  capture  and  formalisation: 

•  integration  of  existing  ontologies:  and 

•  ontology  evaluation  and  documentation. 

Knowledge  captured  in  an  ontology  is  usually  represented  in  a  graphical  notation.  For  instance. 
GLEO  (Graphical  Language  tor  Expressing  Ontologies)  was  used  to  describe  a  software  process 
ontology  (de  Almeida,  de  Menezes  and  da  Rocha.  1998). 

In  this  paper,  we  consider  only  a  part  of  the  ontology  construction  process.  We  consider  only 
purpose  identification  and  requirement  specification  and  ontology  capture.  Moreover,  since  we  do 
not  intend  to  provide  a  formal  description,  we  present  our  ontology  in  a  subset  of  UML  (Unified 
Modelling  Language)  notation  (Fowler  and  Scott,  1997)  instead  of  GLEO.  UML  has  been  used  by- 
other  researchers  to  describe  knowledge.  For  example,  Hasselbring  ( 1999)  used  UML  to  describe 
knowledge  concerned  with  health  care  information  systems.  Since  UML  is  a  standard  object- 
oriented  notation,  we  believe  it  will  make  our  ideas  more  accessible  to  software  engineering  and 
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software  maintenance  researchers.  Furthermore,  it  is  possible  to  improve  the  representation  of  the 
ontology  at  a  later  date  by  inserting  the  axioms  needed  to  formalise  the  whole  model. 

As  a  result  of  our  discussions  at  the  \\  ESS  98  workshop,  we  identified  a  number  of  domain  factors 
that  we  believe  influence  the  maintenance  process.  Figure  1  shows  these  factors  and  how  they  can 
be  classified.  Figure  1  was  the  starting  point  for  our  ontology,  which  is  described  in  more  detail 
in  Section  3.  In  order  to  describe  empirical  maintenance  research,  we  believe  that  the  maintenance 
factors  must  be  specified.  This  will  allow  researchers  to  better  understand  the  maintenance  context 
and  to  plan  the  research  needed  to  investigate  the  relationships  among  these  factors  and  the 
maintenance  context.  A  better  understanding  of  the  relationships  that  exist  between  factors  and 
context  should  lead  both  to  improvements  in  the  maintenance  process  and  to  the  development  of 
new'  research  topics. 

The  maintenance  process  describes  how  to  organise  maintenance  activities.  It  is  similar  to  the 
software  development  process,  but  the  focus  is  on  product  correction  and  adaptation,  not  just 
on  the  transformation  of  requirements  to  software  functionality.  We  take  the  same  viewpoint 
w'hen  considering  methods  and  tools.  It  is  not  usually  necessary  to  define  new  methods  or 
tools  to  accomplish  maintenance  activities:  conventional  software  development  tools  are  usually 
sufficient.  However,  the  maintenance  process  defines  how  these  methods  and  tools  should  be  applied 
to  maintenance  activities,  and  which  skills  and  roles  are  necessary  to  carry  out  the  activities. 
Previous  research  work  has  considered  the  definition  of  methods  (Karam  and  Casselman,  1993), 
process  description  (Pfleeger,  I99S).  software  environment  ontology  (de  Almeida,  de  Menezes 
and  da  Rocha.  1998),  and  tool  classification  (Pressman.  1997).  Although  these  research  results 
considered  the  software  development  process  as  the  basic  framework,  they  are  also  useful  in  the 
context  of  the  maintenance  process. 


In  order  to  understand  the  relationships  among  maintenance  domain  factors,  we  need  to  specify 
each  factor  and  define  the  impact  that  it  has  on  maintenance  activities.  Next,  the  relationships 
themselves  can  be  captured  and  validated.  Validation  usually  requires  empirical  studies  and 
experiments. 
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FLure  1  has  some  similarities  with  the  framework  for  software  maintenance  suggested  bv 
Haworth.  Sharpe  and  Hale  (1992).  They  defined  a  framework  based  on  four  entities:  programmer, 
source  code,  maintenance  requirement  and  environment.  They  suggested  that  each  of  these  basic 
entities  in  the  framework  interacted  to  a  degree  with  the  other  entities.  Each  of  the  entities  and 
each  combination  of  possible  interactions  contribute  to  a  research  area  and  define  the  type  of 
attributes  that  can  be  manipulated.  For  example,  one  area  of  research  is  source  code  attributes, 
and  another  is  the  interaction  between  source  code  attributes  and  programmer  attributes.  They 
l'Se  ,l-  f  area>  t0  c'asS‘^  existing  research  and  discuss  the  way  in  which  experiments  aimed  at 
considering  interactions  could  be  designed.  In  our  ontology,  we  have  generalised  the  concepts 
of  maintenance  requirement,  source  code  and  programmer  to  maintenance  activity,  product,  and 
maintenance  engineer  respectively.  We  have  also  introduced  another  concept:  the  maintenance 
organisation  process.  We  have  omitted  an  environment  entity  because  our  more  generalised  concepts 
include  environmental  considerations.  The  main  difference  between  the  Haworth.  Sharpe  and  Hale 
framework  and  our  ontology  is  that  they  are  concerned  with  the  structure  of  empirical  experiments. 
So.  they  are  not  concerned  with  the  nature  of  the  attributes  attached  to  each  of  their  entities,  whereas 
our  main  concern  is  the  attributes  and  the  way  in  which  they  define  the  context  of  empirical  research. 

3.  THE  MAINTENANCE  ONTOLOGY 

3.1.  Purpose  specification  and  requirements  specification 

Before  discussing  our  conceptualisation  of  the  maintenance  domain,  we  need  to  consider  the 
first  stage  of  ontology  development,  which  is  purpose  specification  and  requirements  specification, 
de  Almeida,  de  Menezes  and  da  Rocha  (I99S)  define  the  activity  of  purpose  specification  to  be 
to  clearly  define  its  purpose  and  intended  uses,  that  is.  the  competence  of  the  ontology'.  The 
competency  of  the  ontology  identifies  the  questions  the  ontology  is  meant  to  answer. 

In  our  case,  the  purpose  of  our  ontology  is  to  identify  contextual  factors  that  influence  the  results 
of  empirical  studies  of  maintenance.  For  example,  suppose  a  researcher  were  investigating  the 
impact  on  productivity  of  new  maintenance  tools  but  did  not  specify  the  experience  of  the  tool  users. 
In  this  case,  it  would  be  difficult  for  other  researchers  to  replicate  the  study,  or  for  practitioners  to 
kncm  whether  or  not  the  results  were  likely  to  apply  in  their  own  situation.  Furthermore,  it  is  not 
just  the  experience  of  tool  users  that  is  likely  to  affect  the  study's  results  and  their  interpretation. 
Other  factors  that  need  to  be  specified  include  the  type  of  product  being  maintained,  and  the  type  of 
maintenance  tasks  being  performed. 

In  observational  studies  of  maintenance,  researchers  measure  maintenance  performance 
characteristics  such  as  the  quality  of  maintained  products,  or  the  productivity  or  efficiency  of 
the  maintenance  process  for  different  products  or  different  maintenance  activities,  in  order  to 
identify  how  and  why  these  performance  characteristics  vary.  In  controlled  experiments,  researchers 
investigate  the  impact  of  one  or  more  factors  that  they  believe  affect  maintenance  quality  or 
produuivitv  by  varying  the  factors  in  a  systematic  fashion,  while  controlling  other  factors. 

Thus,  in  order  to  support  empirical  studies  of  both  kinds,  each  factor  in  our  ontologv  needs  to 
answer  the  following  competency  question: 

Would  variations  in  this  factor  (i.e.,  concept)  influence  empirical  studies  of 
maintenance  productivity,  quality  or  efficiency  '/ 
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For  the  purposes  of  ontology  capture,  we  do  not  believe  it  is  necessary  to  identify  every  possible 
interaction  between  maintenance  factors  and  maintenance  performance.  However,  we  do  need  to 
present  a  reasoned  argument  explaining  at  least  one  interaction  for  each  factor.  This  can  also  be 
regarded  as  a  contribution  to  ontology  evaluation.  Any  such  explanation  would  depend  on  being 

able  to  identify  the  way  in  which  each  element  can  vary  in  different  circumstances.  This  implies  a 
second  competency  question: 

What  is  the  nature  of  the  variations  in  this  factor? 

This  second  question  leads  to  preliminary  taxonomies  of  maintenance  elements.  The  taxonomy  is 
also  intended  to  help  practitioners  identify  whether  or  not  empirical  results  are  likely  to  be  relevant  to 
their  specific  maintenance  situation.  The  two  competency  questions  already  identified  are  sufficient 
to  represent  the  viewpoint  of  practitioners  as  well  as  researchers. 

Finally,  we  hoped  that  our  taxonomy  would  also  cast  some  light  on  our  original  workshop  goal, 
w  hich  was  to  consider  the  differences  between  maintenance  and  development  from  the  viewpoint 
of  skill,  tools  and  methods.  This  leads  to  a  third  and  final  competency  question: 

To  w  hat  extent  do  maintenance  methods/tools/skills  differ  from  those  of  development? 

To  address  this  question  fully,  we  would  need  a  software  process  ontology  as  well  as  a 
maintenance  ontology.  Thus,  we  have  not  addressed  this  competency  question  fully.  We  do, 
however,  point  out  some  of  the  differences  we  found  between  our  maintenance  ontology  and  the 
de  Almeida,  de  Menezes  and  da  Rocha  software  process  ontology,  and  identify  some  concepts  that 
are  of  relevance  only  to  maintenance. 

The  following  sections  define  our  ontology.  Because  the  domain  is  very  complex,  we  describe  ’ 
each  main  dimension  shown  in  Figure  1  separately,  with  the  final  integrated  ontology  shown  later 
in  Figure  7.  In  the  next  sections  we  present  our  ontology  of  software  maintenance  w'ith  definitions 
of  all  the  main  concepts  (i.e.,  maintenance  factors).  Where  possible,  we  make  use  of  definitions  and 
concepts  used  by  de  Almeida,  de  Menezes  and  da  Rocha  ( 199S )  in  their  software  process  ontology. 
We  also  consider  the  different  properties  of  the  maintenance  factors  that  impact  the  maintenance 
process  and  can  thus  affect  the  results  of  empirical  studies. 


3.2.  Maintained  product 

3.2.1.  Overview 

Figure  2  shows  our  product  ontology.  Table  1  defines  the  concepts  used  in  the  ontology. 
Characteristics  of  these  elements  that  affect  maintenance  performance  are  discussed  in  the  following 
sections.  Note  that  in  their  software  process  ontology,  de  Almeida,  de  Menezes  and  da  Rocha  do  no" 
consider  the  relationship  between  the  total  product  and  its  composite  artefacts. 


3.2.2.  Product  size 

The  size  of  the  product  affects  the  number  and  organisation  of  the  staff  needed  to  maintain  it. 
Table  2  suggests  a  coarse-grain  size  measure  for  classification  purposes.  There  are  relationships 
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F retire  2.  The  maintained  product  ontology 
Table  I.  The  maintained  product  ontology  definitions 


Product 

Product 

upgrade 

Artefact 


The  product  is  the  software  application,  product  or  package  that  is  undergoing 
modification.  A  product  is  a  conglomerate  of  a  number  of  different  artefacts. 

A  change  to  the  baseline  product  that  implements  or  documents  a  maintenance 
activity.  An  upgrade  may  be  a  new  version  of  the  product,  an  object  code  patch, 
or  a  restriction  notice. 

Artefacts  that  together  correspond  to  a  software  product  can  be  of  the  following 
types,  documents  that  can  be  subdivided  into  textual  and  graphical  documents. 
COTS  products,  and  object  code  components.  Textual  documents  include  source 
code  listings,  plans,  design  and  requirements  specifications. 


Table  2.  Product  size 


Product  size 

Maintenance  team  size 

Small 

1  person 

Medium 

1  team 

Large 

Multiple  teams 

between  the  size  measure  and  maintenance  team  organisation.  For  example,  geographically 
distributed  maintenance  teams  usually  maintain  large  products.  The  size  of  the  enhancements  and 
the  size  of  the  product  are  likely  to  affect  maintenance  productivity.  The  larger  the  product  the  more 
likely  it  is  that  product  knowledge  will  be  spread  unevenly  among  the  maintenance  staff,  making 
it  more  difficult  to  diagnose  the  cause  of  some  problems  and  identify  all  the  modifications  needed 
to  support  a  large  enhancement.  In  addition,  when  many  people  are  working  together  on  a  large 
enhancement,  there  are  more  opportunities  for  misunderstandings  that  can  lead  to  quality  problems. 
Thus,  maintenance  activities  on  large  products  may  be  less  productive  than  maintenance  activities 
on  small  products. 


3.2.3 .  Application  domain 

Many  researchers  (e.g..  Maxwell,  van  Wassenhove  and  Dutta,  1966)  have  observed  major 
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productivity  differences  between  products  from  different  application  domains.  We  believe  such 
differences  apply  to  maintenance  activities  as  well  as  development  activities.  In  addition,  the 
application  domain  (e.g..  finance,  telecommunications,  command  and  control,  etc.)  places  domain 
knowledge  requirements  on  maintenance  human  resources.  It  also  places  constraints  on  the 
maintenance  artefacts  and  product.  For  example,  safety  critical  system  maintenance  must,  at  all 
cost,  preserve  software  reliability  requirements,  whereas  in  the  telecommunications  world  there  is 
more  emphasis  on  fast  upgrades  to  software  in  order  to  minimise  time  to  market.  These  different 
constraints  mean  that  different  aspects  of  maintenance  performance  are  optimised. 


3.2 A.  Product  age 

The  age  of  a  product  (i.e.,  the  age  in  years  since  first  release)  can  affect  maintenance  in  different 
ways: 

•  It  the  de\e!opment  technology  is  very  old,  it  may  be  difficult  to  find  maintenance  human 
resources  uith  skills  in  the  old  technology  (hence,  the  practice  of  *grey-sourcing'  the 
maintenance  ot  some  products  by  bringing  older  programmers  out  of  retirement).  In  addition, 
it  may  be  difficult  to  find  support  tools,  such  as  compilers  and  static  analysers,  and  support 
for  the  tools. 

•  It  the  product  is  old.  it  may  be  difficult  to  access  the  original  developers  or  the  original 
de\elopment  documentation.  This  can  lead  to  products  or  parts  of  products  that  no  one 
understands  well  enough  to  change. 

Thus,  in  general  vve  expect  maintenance  performance  to  be  better  for  younger  than  older  products. 


3.2.5 .  Product  maturity 

Product  maturity  is  different  from  product  age.  It  concerns  the  life  cycle  of  a  product  after  initial 
release.  The  basic  phases  in  the  life  of  a  product  and  their  relationship  with  maintenance  tasks 
and  user  population  are  summarised  in  Table  3,  which  is  similar  to  the  life  cycle  described  by 
Kung  and  Hsu  (1998).  The  maintenance  life  cycle  starts  at  first  release  and  ends  when  a  product 
is  withdrawn  from  use.  It  is  important  to  note  that  large  enhancements  cause  mini-cycles,  where 
a  product  can  be  forced  back  into  periods  ot  infancy  and  adolescence  as  a  result  of  poor  quality 
product  releases.  Table  3  suggests  that  the  type  of  maintenance  tasks  undertaken  by  an  organisation 
is  related  to  the  maturity  of  a  product,  as  is  the  size  ot  its  user  population.  Note  that  a  consideration  of 
user  population  is  irrelevant  for  some  custom-built  products  that  have  a  sinale  client-sinsle  mission 
profile. 


3.2.6.  Product  composition 

The  level  of  abstraction  of  the  component  artefacts  of  a  product  affects  the  skills  required  by 
maintenance  engineers  and  the  tools  they  need  to  support  them.  If  products  are  generated  from 
designs,  maintenance  engineers  need  access  to  the  code  generation  tools.  If  the  product  is  composed 
of  black  box  components  (e.g.,  a  COTS  product),  maintenance  engineers  need  integration  skills 
rather  than  coding  skills. 
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Table  3.  Maintenance  life  cycle 


Life  cycle  stage 

Maintenance'  task 
prevalence 

User  population 

Infancy — after  release,  initial  users  start  reporting 
defects. 

Corrections 

Small 

Adolescence— as  the  user  population  grows,  defect 

Corrections. 

Growing 

reports  still  predominate  but  there  may  be  changes  to 
amend  the  system  behaviour. 

requirement 

changes 

Adulthood— the  product  is  relatively  defect  free,  but  if 
it  is  accepted  by  a  wide  user  population  there  will  be 
requests  for  new  functionality.  In  addition,  as  change 
accumulates  there  will  be  a  need  to  restructure  parts  of 
the  system  to  avoid  design  decay,  so  implementations 
changes  to  improve  code  structure  may  be  required. 

New  requirements. 

implementation 

changes 

Maximum 

Senility  (legacy) — there  are  newer  products  available  and 
only  a  few  users  remain  to  be  supported.  Usually  only 
corrective  maintenance  and  workaround:*  are  provided.  * 

Corrections 

Declining 

3.2. 7.  Product  and  artefact  quality 

The  original  software  development  process  and  the  quality  of  the  product  it  delivered  place 
constraints  on  the  subsequent  maintenance  process.  In  our  experience  it  is  easier  to  maintain  a 
good  quality  product  than  a  poor  quality  product,  where  'quality'  includes  issues  such  as  product 
structure,  documentation,  and  the  quality  of  individual  artefacts.  Furthermore,  the  less  contact  a 
maintenance  organisation  has  with  the  original  software  developers,  the  more  it  is  dependent  on 
the  availability  of  good  quality  documentation,  bearing  in  mind  that  there  are  many  different  forms 
of  documentation  associated  with  a  software  product.  In  terms  of  defining  the  impact  of  document 
quality  on  maintenance  activities,  we  need  to  assess  the  extent  to  which  documentation  is: 

•  complete. 

•  accurate,  and 

•  readable. 


For  old  products,  documentation  is  often  poor  or  non-existent.  In  such  cases,  maintenance 
engineers  need  specialised  tools  such  as  re-engineering  tools.  Thus,  comparisons  of  maintenance 
performance  across  different  products  will  be  of  limited  value  unless  it  is  clear  that  the  maintenance 
tool  requirements  of  each  product  have  been  met  to  an  equivalent  degree,  and  that  the  quality  of  the 
component  artefacts  is  comparable. 


33.  Maintenance  activities 

Figure  3  shows  our  maintenance  activity  ontology,  which  is  derived  from  de  Almeida,  de  Menezes 
and  da  Rocha  s  software  development  activity  ontology.  We  have  amended  that  ontology  to  consider 
maintenance  activities  rather  than  software  construction  activities,  and  have  omitted  elements  that 
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Figure  3.  The  maintenance  activity  nntnln^y 


do  not  have  any  major  impact  on  maintenance  performance.  In  particular,  we  have  added  the 
concept  of  an  investigation  activity,  and.  instead  of  having  a  construction  activity,  we  have  a 
maintenance  activity.  Furthermore,  we  have  identified  configuration  management  as  one  of  the 
types  ot  management  activity.  We  have  also  included  the  resource  concept  in  this  ontology,  whereas 
de  Almeida,  de  Menezes  and  da  Rocha  { 1998)  had  a  separate  resource  ontology.  Definitions  of 
the  elements  in  the  ontology  are  given  in  Table  4.  A  discussion  of  the  impact  of  the  elements  on 
maintenance  performance  follows. 

In  our  view,  one  of  the  major  differences  between  software  development  and  software 
maintenance  is  that  de\  elopment  is  requirement-driven  and  maintenance  is  event-driven.  This  means 
that  the  stimuli  (i.e.,  the  inputs)  that  initiate  a  maintenance  activity  are  unscheduled  (random)  events. 

Input  events  usually  originate  from  the  users  (or  client  or  customer)  of  the  software  application, 
but  may  also  originate  from  maintenance  human  resource  (engineers  or  managers).  Thus,  the  first 
activity  needed  by  a  maintenance  process  (after  the  administrative  process  of  logging  the  event)  is 
an  investigation  activity,  whereby  a  maintenance  engineer  is  assigned  to  assess  the  nature  of  event, 
which  can  be  either  a  problem  report  or  change  request.  On  completion  of  an  investigation  activity, 
maintenance  managers  must  decide  whether  or  not  to  proceed  with  a  maintenance  modification. 
This  is  discussed  in  more  detail  in  Section  3.4.3. 

Maintenance  modifications  are  often  referred  to  as  corrective,  adaptive  or  perfective  following 
Swanson  s  typology  (Swanson  and  Chapin,  1995).  However,  since  identifying  a  modification  as 
an  adaptive  or  a  perfective  maintenance  activity  depends  on  the  reason  for  the  change,  and  not 
on  an  objective  characteristic  ot  the  change,  we  have  used  the  following  definition  for  types  of 
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Activity 


Investigation 

activity 

Modification 

activity 

Management 

activity 

Quality 
assurance 
acti\  it\ 


Resource 


Table  4.  Maintenance  activity  ontology  definitions 


An  action  ot  one  of  the  following  types:  an  investigation  activity,  a  modification 
activity,  a  management  activity,  or  a  quality  assurance  activity.  An  activity  may 
be  made  up  of  a  number  ot  sub-activities.  Usually,  it  takes  as  input  one,  or  more 
existing  artifacts  and  outputs  zero,  one  or  many  new  or  modified  artifacts. 

An  activity  that  assesses  the  impact  ot  undertaking  a  modification  arising  from  a 
change  request  or  problem  report. 

An  activity  that  takes  one  or  more  input  artefacts  and  produces  one  or  more 
output  artefacts  that,  when  incorporated  into  an  existing  system,  change  its 
behaviour  or  implementation. 

An  activity  related  to  the  management  of  the  maintenance  process  or  to  the 
configuration  control  of  the  maintained  product  (see  Figure  5  and  Table  6). 

An  activity  aimed  at  ensuring  that  a  modification  activity  does  not  damage  the 
integrity  ot  the  product  being  maintained.  Quality  assurance  activities  may  be 
classified  as  testing  or  certification  activities  (entity  omitted  from  Figures  3 
and  7). 

Everything  that  is  used  to  perform  an  activity.  Resources  may  be  hardware, 
software  or  human  resources. 


maintenance  changes: 

•  Corrections  that  correct  a  defect — i.e..  a  discrepancy  between  the  required  behaviour  of  a 
product/application  and  the  observed  behaviour. 

•  Enhancements  that  implement  a  change  to  the  system  that  changes  the  behaviour  or 
implementation  of  the  system.  We  subdivide  enhancements  into  three  types: 

•  enhancements  that  change  existing  requirements, 

•  enhancements  that  add  new  system  requirements,  and 

•  enhancements  that  change  the  implementation  but  not  the  requirements. 

Broadly  speaking,  enhancements  that  are  necessary  to  change  existing  requirements  can  be 
equated  to  Swanson  s  perfective  maintenance  changes.  Those  that  are  necessary  to  add  new 
requirements  to  a  system  can  be  equated  to  adaptive  maintenance.  Changes  that  do  not  affect 
requirements  but  only  affect  implementation  might  be  referred  to  as  preventive  maintenance  (by 
analogy  to  what  happens  when  you  have  your  car  serviced).  Note  that  corrections  may  result  in 
similar  types  of  product  modifications,  but  we  do  not  feel  that  it  is  necessary  to  define  correction 
subtypes. 

There  is  not  a  one-to-one  relationship  between  problem  reports  and  corrective  maintenance. 
Sometimes,  the  ‘problems  noted  by  users  are  requests  for  behaviours  that  were  not  originally 
required.  In  such  cases,  the  problem  report  leads  to  an  enhancement  rather  than  a  correction.  It 
is  important  to  determine  whether  maintenance  work  is  a  correction  or  an  enhancement  because  the 
activities  are  often  budgeted  separately.  In  fact,  many  of  the  disputes  between  the  customer/client 
and  maintainers  revolve  around  whether  a  change  is  a  correction  or  an  enhancement.  If  the 
customer/client  did  not  fully  and  unambiguously  define  the  required  behaviour,  it  is  often  difficult 
to  decide  whether  a  modification  is  a  correction  or  an  enhancement. 
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Characteristics  of  maintenance  activities  that  affect  the  productivity  and  efficiency  of  maintenance 
activities  include  the  size  of  the  modification  and  the  criticality  of  the  modification.  Large 
enhancements,  particularly  large  enhancements  of  large  products,  are  likely  to  require  effort  from 
several  different  maintenance  engineers,  and  will  thus  incur  coordination  and  communication 
overheads.  Smaller  enhancements  that  can  be  performed  within  schedule  by  one  maintenance 
engineer  are  usually  more  productive.  The  criticality  of  an  enhancement  or  correction  impacts 
the  elapsed  time  it  takes  for  the  modification  to  be  delivered  to  users,  since  the  scheduling  of  the 
modification  will  be  determined  mainly  by  its  criticality. 

To  accomplish  the  different  maintenance  activities,  maintenance  engineers  require  different 
degrees  of  product  understanding  and  different  types  of  development  tools.  A  corrective  activity  may 
require  only  the  ability  to  locate  faulty  code  and  make  localised  changes,  whereas  an  enhancement 
activity  may  require  a  broad  understanding  of  a  large  part  ot  the  product  (Singer.  1998).  In  the  first 
case,  a  maintainer  will  require  testing  or  simulation  tools  to  recreate  the  problem  and  debugging 
tools  to  step  through  suspect  code.  In  the  second  case,  a  maintainerLs  tool  requirements  will 
depend  on  the  quality  of  the  development  documentation,  and  the  availability  of  the  development 
environment.  It  the  maintainer  has  poor  documentation  and  little  of  the  original  development 
environment,  he/she  may  require  re-engineering  tools  and/or  code  navigation  and  cross-referencing 
tools. 

The  efficiency  and  quality  ot  investigation  activities  depends  on  the  maintenance  engineer 
knowing  the  current  status  of  patches  and  planned  modifications  that  apply  to  the  part  of  the  product 
involved  with  the  new  problem  report  or  change  request.  The  availability  of  such  information 
depends  on  the  effectiveness  ot  the  product  configuration  control  and  change  control  process.  A 
good  configuration  control  process  is  necessary  to  identify  the  status  of  each  product  component, 
including  information  such  as  the  currently  applied  patches.  A  formal  change  control  process  might 
slow  down  the  rate  at  which  the  maintenance  process  responds  to  input  stimuli,  but  mav  improve 
the  ability  of  the  change  control  and  maintenance  processes  to  preserve  the  integritv  of  the  product 
under  maintenance  and  its  constituent  artefacts. 


3.4.  Software  maintenance  process 

3.4.1.  Two  processes 

Within  a  software  maintenance  department,  there  are  two  different  maintenance  processes: 

•  the  maintenance  process  used  by  individual  maintenance  engineers  to  implement  a  specific 
modification  request,  and 

•  the  organisation  level  process  that  manages  the  stream  of  maintenance  requests  from 
customers/clients,  users  and  maintenance  engineers. 

We  consider  both  types  of  process  separately.  In  order  to  use  terminology  similar  to  that  used 
by  de  Almeida,  de  Menezes  and  da  Rocha  (1998).  we  refer  to  our  definition  of  the  first  process 
as  the  software  maintenance  procedure  ontology  (see  Figure  4).  de  Almeida  has  no  equivalent  to 
the  second  process  in  his  ontology.  We  refer  to  the  second  process  as  the  maintenance  organisation 
process  (see  Figure  5). 
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Figure  4.  The  maintenance  procedure  ontolnz\ 

Table  5.  Maintenance  procedure  ontology  definitions 


Development 

technology 


Paradigm 


Procedure 


Method 

Script 

Technique 


The  technology  used  when  the  product  and  its  constituent  artefacts  were 
originally  constructed,  for  example,  knowledge-based  system  technology, 
conventional  data  processing  technology.  The  original  development 
technology  constrains  the  possible  maintenance  procedures. 

The  philosophy  adopted  during  the  original  construction  of  the  maintained 
product,  tor  example,  the  object-oriented  paradigm  or  procedural  paradigm. 
The  original  paradigm  constrains  the  possible  maintenance  procedures. 

The  conduct  followed  to  perform  an  activity.  A  procedure  may  be  classified 
as  a  method,  technique  or  script.  A  procedure  may  be  adopted  to  perform  a 
specific  activity  from  a  set  of  possible  procedures. 

A  systematic  procedure  defining  steps  and  heuristics  to  permit  the 
accomplishment  of  one  or  more  activities. 

A  guideline  for  constructing/amending  a  specific  type  of  document. 

A  procedure  used  to  accomplish  an  activity  that  is  less  rigorously  defined  than 
a  method. 


3.4.2.  Software  maintenance  procedure 

The  software  maintenance  procedure  ontology  shown  in  Figure  4  is  used  to  modify  one  or  more 
artefacts  in  order  to  implement  a  required  software  modification.  The  concepts  shown  in  Figure  4  are 
defined  in  Table  5.  The  definitions  have  been  adapted  from  de  Almeida,  de  Menezes  and  da  Rocha's 
definitions. 

Artefacts  are  not  solely  source  and  object  code  items.  They  comprise  documents,  system 
representations  and  plans,  etc.,  constructed  throughout  the  software  development  process,  and 


Copyright  %  1999  John  Wiley  <fc  Sons.  Ltd. 


J.  Softw.  Maim:  Res.  Pract.  11.  365-389  ( 1999) 


118 


ONTOLOGY  OF  SOFTWARE  MAINTENANCE 


modified  during  maintenance.  A  variety  of  different  scripts,  methods  -and  techniques  are  used  to 
construct  and  modify  such  artefacts,  and  they  are  usually  available  to  support  maintenance  activities. 

Maintenance  activity  performance  will  be  affected  by  the  choice  of  software  development 
technology  and  development  paradigm.  It  will  also  be  affected  by  the  extent  to  which  procedures 
are  automated.  In  general,  development  technologies  such  as  the  development  language  and  the 
development  paradigm  place  constraints  on  maintenance  activities,  and  skill  requirements  on 
maintenance  human  resources.  The  ISO/IEC  12207  Standard  defines  an  ‘activity*  as  a  life  cycle 
phase  and  a  ‘task  as  something  done  as  part  of  an  activity.  Here  we  are  using  only  the  term 
‘activity  .  but  an  activity  can  be  decomposed  into  smaller  activities,  therefore  capturing  the  ISO/IEC 
definitions. 

In  addition,  the  chosen  development  technology  may  present  a  significant  risk  to  product 
maintainability.  A  software  product  cannot  continue  to  be  maintained  if  its  development 
environment  is  not  available  to  its  maintainors.  For  products  with  a  long  lifetime  it  is  necessary 
to  ensure  that  technologies  such  as  compilers,  code  generators  and  CASE  tools  w  ill  themselves  be 
supported  throughout  the  estimated  lifetime  of  the  product. 


3.43,  Maintenance  organisation  processes 

Figure  5  shows  the  maintenance  organisation  process.  Table  6  briefly  defines  the  concepts  used 
in  the  model. 

A  maintenance  organisation  must  handle  a  stream  of  maintenance  requests  from  users,  customer 
and  maintainers.  Thus,  a  major  element  of  a  maintenance  organisation  is  event  management 
(Niessink  and  van  Vliet.  I99S).  Another  major  element  of  a  maintenance  organisation  is 
configuration  management.  Configuration  management  is  the  process  responsible  for  releasing  new 
system  versions  and  system  amendments  to  users.  In  addition,  configuration  control  systems  need 
to  protect  the  integrity  of  the  product  w  hen  it  is  being  modified.  In  particular,  they  need  to  ensure 
that  maintenance  engineers  know  the  current  repair  status  of  the  product  and  product  components. 
If  the  configuration  control  system  is  inadequate,  maintenance  activities  will  be  less  efficient  and 
there  is  a  danger  that  product  quality  will  be  compromised. 

In  addition,  there  needs  to  be  a  management  process  for  authorising  or  rejecting  modification 
activities  after  initial  investigation  of  the  trigger  event.  Thi.^  is  usually  the  responsibility  of  a  change 
control  board.  The  authorisation  process  may  also  include  a  process  of  negotiation  with  the  client 
about  contractual  arrangements  tor  implementing  a  required  modification  (e.g..  budgets/price  and 
time-scales).  Only  after  a  proposed  modification  activity  is  approved  by  the  change  control  board 
and  any  necessary  contractual  arrangements  are  agreed  with  the  client  (which,  for  applications  like 
operating  systems  or  self-standing  products,  may  be  the  marketing  department),  will  the  proposed 
modification  activity  be  scheduled.  A  change  control  board  can  be  organised  as  a  formal  process 
involving  meetings  between  users  and  customers/clients  and  maintenance  managers,  or  as  a  simple 
working  procedure.  The  level  of  formality  can  affect  quality  and  efficiency.  Forma!  change  control 
boards  are  likely  to  slow  the  maintenance  process  but  are  better  able  to  protect  the  integrity  of  the 
product  being  maintained. 

The  efficiency  of  maintenance  management  activities  is  affected  by  the  use  of  support  tools. 
Most  organisations  have  configuration  control  tools.  There  are  also  many  tools  to  assist  event 
management.  For  example,  many  maintenance  organisations  use  'help*  desk  tools,  which  allow 
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e%  ents  to  be  logged  into  an  organisation  and  their  progress  tracked  though  the  various  maintenance 
ta^ks  needed  to  resolve  the  event.  Another  type  of  tool  that  supports  the  interface  between  the  user 
population  and  a  maintenance  organisation  is  a  ‘known  error  Iog\  which  identifies  all  currently 
known  errors  and  their  workarounds  or  fixes. 

The  volume  and  type  of  maintenance  requests  affect  the  performance  of  the  maintenance 
organisation.  For  example,  if  there  are  a  large  number  of  defects  reported,  there  mav  be  insufficient 
resources  to  undertake  perfective  or  preventive  modifications. 

Service  level  agreements  define  the  maintenance  organisation's  performance  targets.  Differences 
in  achieved  performance  level  may.  therefore,  be  due  to  different  performance  targets.  Maintenance 
organisations  must  be  engineered  to  meet  their  service  level  agreements.  This^is  often  done  by 
separating  various  support  activities  into  well-defined  roles  that  can  be  performed  by  staff  with 
specialised  skills.  For  example,  many  maintenance  organisations  use  the  concept  of  support  levels 
to  separate  staff,  whose  main  concern  is  to  support  the  user  population  and  those  concerned  with 
correcting  or  enhancing  software. 

At  its  simplest  there  may  just  be  two  support  levels: 

•  Level  I— this  level  provides  the  personnel  who  staff  the  help  desk. 

•  Level  2  this  level  provides  the  personnel  who  make  changes  to  software. 
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Table  6.  Maintenance  organisation  process  ontology  definitions 


Service  level  agreement 


Maintenance  management 


Event  management 
Change  control 


Configuration 

management 

Maintenance  organisation 
structure 

Maintenance  event 
Investigation  report 


An  agreement  between  the  providers  of  a  maintenance  service  and  the 
customers  of  a  maintenance  service  that  specifies  the  performance  targets 
for  the  maintenance  service. 

The  process  used  to  manage  the  maintenance  service  (as  opposed 
to  the  procedure  used  to  manage  individual  maintenance  requests). 
The  organisation  process  is  established  and  maintained  by  senior 
maintenance  managers.  It  is  responsible  for  defining  the  structure  of 
the  maintenance  organisation  such  that  it  can  fulfill  its  service  level 
agreement.  Maintenance  management  has  three  main  concerns  other  than 
the  normal  concerns  of  quality  assurance  and  project  management:  event 
management,  configuration  control,  change  control. 

Event  management  is  the  process  responsible  for  handling  the  stream  of 
events  received  by  the  maintenance  organisation. 

Change  control  is  the  process  responsible  for  evaluating  the  results  of 
maintenance  event  investigations  and  deciding  whether  or  not  to  approve 
a  product  modification. 

Configuration  management  is  responsible  for  maintaining  the  integrity 
of  the  product  in  terms  of  its  version  and  modification  status.  It  is  also 
responsible  for  the  production  of  product  upgrades. 

The  roles  undertaken  by  maintenance  human  resources  in  a  maintenance 
organisation  in  order  to  perform  the  required  administrative  procedures. 

A  problem  report,  or  change  request  originating  from  a  customer  or  user 
of  the  maintained  product  or  a  member  of  the  maintenance  organisation. 

The  outcome  of  investigating  the  cause  and  implications  of  a  maintenance 
event. 


However,  at  least  three  support  levels  is  the  more  common  situation: 

•  Level  1 — the  help  desk  staff  are  non-technical.  and  are  responsible  for  logging  problems  and 
identifying  the  technical  support  person  most  likely  to  be  able  to  assist  a  user. 

•  Level  2 — the  technical  support  personnel  know  how  to  communicate  with  users  and 
understand  their  problems,  and  they  can  advise  on  workarounds  and  quick  fixes. 

•  Level  3 — the  maintenance  engineers  are  authorised  to  make  changes  to  the  product. 

The  separation  of  maintenance  services  across  different  service  levels  makes  it  dear  that  not  ail 
maintenance  work  results  in  product  modification.  Users  may  simply  require  advice  about  how  to 
use  the  product  or  how  to  circumvent  a  known  problem  with  the  product.  The  number  of  levels 
and  the  specific  roles  they  support  affect  the  performance  of  the  maintenance  service.  For  example, 
if  there  are  too  many  levels  there  may  be  an  unacceptable  delay  in  responding  to  certain  types  of 
maintenance  request. 

The  other  main  role  for  a  maintenance  organisation  is  the  planning  and  scheduling  of  maintenance 
releases.  This  involves  identifying  the  content  of  difference  releases  and  a  release  cycle  that  is 
appropriate  to  customer  requirements.  Factors  such  as  the  interval  between  scheduled  maintenance 
releases  and  the  extent  of  change  permitted  to  a  product  can  have  a  significant  impact  on  the  quality 
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pr0<JUC‘(Lehman-  Perr>'  and  Ramil.  1998).  The  procedures  for  releasing  object 
(Mellor  i983).eXamP  e*  °"  ta,K  ^  Pen°diC  COllated  UpdaleS)  Can  also  affect  Preset" quality 


3.5.  Peopleware 

3.5.  J.  Two  groups 


n  J°ftvvare  productlon  and  maintenance  are  human  intensive  activities.  Furthermore,  they  involve 
L  P  e  ,VV°r  ‘r f togethe^ In  teams>  wh,ch  are  m  turn  part  of  larger  organisations.  Thus,  no  complete 
two  tvoe  of  t  fr°rS  f  HCtmS  maintenance  can  iSnore  the  human  and  social  elements.  There  are 

he  staff  in  rht  o  T  /‘ , ■ "  3  ma,ntenance  process:  the  staff  in  the  maintenance  organisation,  and 
the  staff  m  the  cu  tomer/cl, ent  orgamsation.  Figure  6  shows  our  initial  model  of  these  factors  The 
definition  of  peopleware  concepts  is  given  in  Table  7. 


3.5.2. 


Maintenance  organisation  staff 


..5...1  Staffcimtucies.  Staff  attitudes  and  motivation  are  generally  agreed  to  impact  on  the  quality 
any  activity.  In  the  area  of  software  maintenance,  problems  with  motivation  are  expected  because 

developmrntntenanCe  *  perCeived  l°  be  °f  leSS  importance  a"d  less  well-rewarded  than 
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Table  7.  Peopleware  ontology  definitions 


Client  organisation  The  organisation  or  organisations  that  use  the  maintained  product  and 
have  a  defined  relationship  with  the  maintenance  organisation. 

Maintenance  The  organisation  that  maintains  the  product  or  products. 

organisation 

Human  resource  Employees  of  the  maintenance  or  client  organisation.  Maintenance 
organisation  staff  can  be  classified  as  managers  or  engineer^  (For 
simplicity  we  have  omitted  specialised  QA  staff  who  may  be  considered 
a  special  class  of  engineer.)  Employees  of  the  client  organisation  can 
be  classified  as  users  or  customers.  Managers  in  the  maintenance 
organisation  negotiate  with  customers  to  determine  service  level 
agreements  and  costs  and  scheduling  of  requirement  enhancements. 


Management  often  compounds  attitude  problems  by: 

•  making  maintenance  work  equivalent  to  a  punishment,  and 

•  assigning  novices  to  maintenance  work. 

This  factor  seems  difficult  to  characterise,  but  is  likely  to  have  a  major  impact  on  the  productivity 
and  quality  of  maintenance  activities  and  the  extent  to  which  the  maintenance  staff  is  receptive  to 
process  change. 

3. 5.2.2.  Staff  responsibilities.  One  area  that  seems  to  have  a  major  impact  on  the  entire 
maintenance  culture  of  an  organisation  is  whether  or  not  there  is  a  strict  separation  between  staff 
responsible  for  software  development  and  those  responsible  for  software  maintenance. 

At  one  extreme,  there  is  no  real  separation  between  development  and  maintenance.  This  seems 
to  be  associated  with  a  particular  type  of  product,  i.e..  a  product  undergoing  continual  evolution 
that  is  released  periodically  to  clients  and  users.  The  software  developers  incorporate  corrective, 
perfective  and  preventive  maintenance  tasks  into  a  process  aimed  at  a  continuing  stream  of  planned 
enhancements.  In  such  an  environment  there  may  be  no  practical  difference  between  the  tools  and 
procedures  used  for  ‘development'  and  those  used  for  ‘maintenance’.  Furthermore,  the  personnel 
themselves  do  not  make  any  significant  distinction  between  development  and  maintenance,  which 
reduces  motivation  problems. 

At  the  other  extreme,  there  are  maintenance  organisations  that  are  completely  separate  from 
development  departments,  and  indeed  may  not  work  for  the  same  company  that  developed  the  code 
they  maintain.  In  such  an  environment,  maintenance  programmers  may  need  specially  designed 
tools  to  support  their  maintenance  tasks. 

Another  issue  is  whether  staff  are  responsible  for  the  maintenance  of  a  single  product  or  group 
of  products  (i.e.,  a  product  portfolio).  It  is  usual  for  an  evolutionary  style  of  development  to  be 
organised  around  a  single  product  or  product  family,  whereas  a  separate  maintenance  group  usually 
looks  after  a  portfolio  of  different  products. 

These  are  issues  that  should  concern  maintenance  managers  when  service  level  agreements  are 
defined,  or  when  they  are  initially  bidding  fora  maintenance  contract. 
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and  n^itv ' off- "I'  ^  gCnera1’ thC  m°re  Ski,led  the  maintenance  staff,  the  better  the  productivity 
nefd  rnt  °f  ™,r\tenance  Different  activities  require  different  skills,  so  these  factor 

need  to  be  controlled  or  specified  during  empirical  studies  of  maintenance  activities. 

3.5.3.  Customer  and  user  staff 

Customer  and  user  issues  that  affect  maintenance  are: 

'  particidm- appHca^on. P°PU^ati0n  ^  ^  am°Um  °f  "0rk  re^irsd  t0  SUPP°*  a 

*  It  If?biHty  of ' *e  user  population,  which  affects  the  scope  of  maintenance  tasks.  The  more 

aried  the  user  population,  the  more  varied  the  problems  they  will  encounter  and  refer  to  the 
maintenance  staff. 

•  Whether  or  not  the  client  and  maintenance  organisation  are  part  of  the  same  company. 

Relationships  between  client  and  maintenance  group  may  be  less  co-operative  if  the  groups 
are  from  different  companies.  ^ 

The  extent  to  which  the  customer/client  and  users  have  common  goals.  Customers/clients 
un  maintenance  activities.  If  they  do  not  understand  the  requirements  of  the  real  users,  they 
ma>  impose  inappropriate  service  level  agreements,  to  the  detriment  of  the  product  users  who 

will  in  turn  become  less  satisfied  with  the  maintenance  organisation. 


4.  TWO  MAINTENANCE  SCENARIOS 
4.1.  Organisation  distinction 

Figure  7  shows  the  full  maintenance  ontology.  In  this  section,  we  use  this  ontoloay  to  specify  two 

m  ^enLmamtTrCe  oena?°S'  Staff  resP°nsibilit>'  seems  to  be  one  of  the  most  important  factors 
111  T'°gy:  °Ur  dlSCUSsion  at  the  WESS  workshop  continually  returned  to  the  issue  of 
hether  or  not  the  maintained  and  software  developers  were  the  same  people. 

erefore.  in  this  section,  we  define  two  maintenance  scenarios  based  on  this  distinction: 

•  Evolutionary  development,  and 

•  Independent  maintenance  organisation. 

We  show  how  the  factors  identified  in  the  ontology  differ  in  the  two  scenarios.  In  addition,  we 
consider  for  each  the  related  industrial  concerns. 


4.2.  Evolutionary  development 


Table  8  specifies  the  evolutionary  development 
practitioners  are  often  concerned  with  optimisms  the 
include: 


scenario.  In  this  maintenance  scenario, 
evolutional*)'  process.  Particular  concerns 


•  optimisation  (and/or  minimisation)  of  inter-release  intervals. 

•  prediction  of  release  quality/reliability. 
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Staff  responsibilities 

Product  size 

Development  technology 
Application  domain 
Product  age 

Product  maturity 


Maintenance  management 
process 


Maintenance  group 
organisation 

Staff  attitudes 

Types  of  maintenance 

Customer  and  user  types 
Document  quality 
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Table  8.  Evolutionary  development  scenario 


Maintenance  engineers  are  responsible  both  for  producing  new  product 
upgrades  and  for  correcting  problems  in  past  releases.  Staff  are 
responsible  tor  the  evolution  of  a  single  product  or  product  family. 

Usually  large.  Examples:  Space  Shuttle.  Microsoft  Word,  ICL  VME 
Operating  System.  Note,  however,  large  products  often  encourage 
small  companies  to  produce  small  add-on  products.  These  small 
products  track  the  evolution  of  larger  products.  For  example  PK2IP 
tools  have  evolved  in  line  with  Microsoft  products  from  DOS  to 
W  indows  3. 1  to  Windows  98. 

The  maintenance  and  development  technologies  are  identical. 
Maintenance  activities  do  not  require  additional  staff  skills  or  tools. 

Application  domain  knowledge  is  required  both  for  maintenance  and 
development. 

As  the  product  ages,  the  original  software  developers  will  move  to 
other  jobs  so  some  expertise  is  lost.  However,  there  is  also  some 
continuity  resulting  from  the  overlap  between  older  staff  leaving  and 
new  staff  joining  the  group. 

The  impact  of  maturity  on  an  evolving  product  depends  on  the  client 
and  user  population.  For  shrink-wrapped  products,  there  is  a  danger 
that  maintenance  requests  arising  from  a  large  user  population  will 
interfere  with  enhancement  activities.  For  example,  defect  reports 
arising  from  release  n  will  be  received  during  the  development  of 
release  h  -  1.  Thi>  can  be  even  more  complicated  if  different  clients 
do  not  upgrade  in  the  same  time  scale,  so  some  client  will  be  reporting 
detects  with  relea>e  //  —  2  while  others  are  reporting  problems  with 
release  n  —  1.  It  one  product  release  is  of  particularly  poor  quality, 
it  may  generate  enough  defect  reports  to  prevent  software  developers 
working  on  the  next  planned  release.  For  custom  products,  such  as 
the  Space  Shuttle,  releases  are  co-ordinated  with  the  specific  client 
activities  so  there  is  less  of  a  problem. 

The  management  will  need  to  provide  a  means  to  administer  the 
stream  of  detect  reports  from  users.  Release  schedules  are  based 
on  prioritising  customer  requirements.  Enhancements  are  funded 
either  by  clients  {analogous  to  development  projects),  or  licensing 
agreements  or  product  sales.  Licensing  agreements  or  product  sales 
usually  covers  maintenance  costs. 

Support  levels  are  often  used  to  separate  software  developers  from 
support  staff  who  interface  with  users. 

Staff  regard  themselves  as  software  engineers  rather  than  developers 
and  maintainers  so  there  are  less  likely  to  be  problems  motivating  staff. 

All  enhancement  activities  are  referred  to  as  evolutionary'  develop¬ 
ment. 

See  product  maturity. 

In  principle,  the  original  software  documentation  would  continue  to  be 
updated  as  part  of  the  evolutionary  release  cycle.  However,  in  practice 
this  would  depend  on  the  organisational  culture  and  management 
practices. 
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Table  9.  Independent  Maintenance  Group  Scenario 


Staff  responsibilities 


Product  size 

Development  technology 


Application  domain 


Product  age 


Product  maturity 

Maintenance  management 
process 


Maintenance  Group 
Organisation 

Staff  attitudes 

Types  of  maintenance 
Customer  and  user  types 


Document  quality 


Maintenance  engineers  are  responsible  for  producing  product 
upgrades  that  may  include  changes  due  to  enhancements  and 
corrections  of  maintenance  tasks.  They  will  usually  not  have 
been  involved  in  original  product  development.  Staff  is  usually 
responsible  for  a  portfolio  of  products. 

Individual  elements  in  a  portfolio  will  be  of  different  sizes. 

Usually  different  products  in  different  portfolios  will  have 
been  produced  using  different  technologies.  The  maintenance 
organisation  will  often  need  to  support  many  different  technologies 
although  the  technologies,  required  by  an  individual  maintainer 
will  usually  be  restricted. 

It  the  portfolio  of  products  is  very  diverse,  it  will  be  difficult 
to  ensure  that  all  maintenance  staff  have  appropriate  domain 
knowledge. 

Different  products  will  have  different  ages.  This  makes  the 
maintenance  of  portfolios  complex  and  planning  and  costing 
maintenance  activities  difficult. 

Different  products  will  have  different  levels  of  maturitv. 

The  management  will  need  to  provide  a  means  to  administer  the 
stream  of  defect  reports  from  users.  They  need  fairly  complex 
estimating  and  risk  management  procedures  to  cope  with  the 
complexity  inherent  in  administering  portfolios.  This  will  be  less 
formal  if  the  client  and  maintenance  group  work  for  the  same 
company.  Relationships  with  customers  are  usually  mandated 
by  a  service  agreement,  although  adaptive  maintenance  may  be 
managed  like  a  development  project. 

Support  levels  are  often  used  to  separate  software  developers  from 
support  staff  who  interface  with  users. 

Motivation  is  likely  to  be  particularly  important  in  maintenance 
groups. 

All  the  standard  types  of  maintenance  are  performed. 

There  seem  to  be  two  different  scenarios:  One  client — manv  users, 
e.g.  in-house  support  groups.  Many  Clients — many  users,  e.g.  a 
third  party  maintenance  shop.  Note  that  in  some  cases  the  number 
of  items  in  the  portfolio  is  important.  Some  maintenance  shops 
support  one  large  custom  product  in  each  client  portfolio  e  o 
Department  of  Defense  in  the  U.S.A. 

This  is  a  critical  issue  for  third  party  maintenance  shops  since 
the)  seldom  have  any  access  to  software  developers.  For  in-house 
support  groups  it  may  be  less  of  a  problem  because  they  may  have 
access  to  the  original  developers. 


Copyright  C  1999  John  Wiley  &  Sons.  Ltd. 


J.  Softw.  Maine  Res.  Pmct.  11.  365-389  (1999) 


128 


ONTOLOGY  OF  SOFTWARE  MAINTENANCE 


•  effort  estimation  for  individual  enhancement  projects,  and 

•  planning  functional  contents  of  releases  to  minimise  the  risk  of  destabilising  the  product  while 
achieving  customer/client  required  functionality. 

Another  important  concern  is  the  impact  of  new  development  paradigms  on  system  evolution, 
e.g.,  RAD  products,  COTS-based  products  and  object-oriented  products. 


4.3.  Independent  maintenance  group 

Table  9  specifies  the  independent  maintenance  group  scenario.  In  this  scenario,  industry  concerns 
differ  according  to  whether  or  not  the  maintenance  ‘shop'  is  in-house  or  a  third-party  organisation.  In 
particular,  third-party  organisations  have  concerns  about  bidding  for  maintenance  contracts  (in  terms 
of  estimation  processes  and  accuracy  and  risks),  that  are  less  important  for  in-house  maintenance 
groups  (unless  they  are  candidates  for  outsourcing).  Furthermore,  outsourcing  organisations — 
particularly  those  that  takeover  in-house  organisations — have  major  management  concerns  about 
the  issues  of  achieving  a  common  organisational  culture  and  changing  the  working  methods  of 
organisations  they  absorb  (Tittle,  1998;  Ketler  and  Willems,  1999). 

All  types  of  maintenance  group  have  concerns  about  maintenance  task  estimating  and  planning 
and  improving  efficiency  of  maintenance  activities.  An  important  issue  for  such  organisations 
is  the  need  for  re-engineering  methods  and  tools  to  address  the  problem  of  lack  of  adequate 
specification/design  documentation  in  older  products. 


5.  CONCLUSIONS 

This  paper  has  presented  an  ontology  of  software  maintenance  aimed  at  assisting  researchers  to 
report  sufficient  contextual  detail  for  other  researchers  and  practitioners  to  understand  the  results 
of  empirical  studies.  We  developed  the  ontology  from  our  personal  experiences  of  the  maintenance 
process  and  have  discussed  two  different  maintenance  scenarios  in  terms  of  the  ontology.  Figure  7 
summarises  the  ontology,  modelled  in  UML. 

One  of  the  problems  w'ith  the  model  is  that  competency  questions  provide  a  criterion  for  inclusion 
of  a  factor  in  the  model,  but  they  do  not  provide  completion  criteria,  nor  do  they  provide  any  concept 
of  relative  importance.  Thus,  the  elements  identified  in  the  model  are  things  that  a  researcher  needs 
to  report  when  describing  empirical  studies,  but  there  may  be  other  factors  we  have  not  included. 
We  must  emphasise  that,  even  using  this  ontology  as  a  guide,  it  is  still  the  responsibility  of  the 
individual  researcher  to  attempt  to  identify  any  special  conditions  that  apply  to  his/her  results. 

Formally,  the  ontology  presented  in  this  paper  is  not  complete.  We  have  not  attempted  to  formalise 
the  ontology  using  predicate  logic,  nor  have  we  fully  evaluated  it.  Furthermore,  since  we  are  not 
attempting  to  integrate  our  ontology  into  a  knowdedge-based  system,  we  do  not  believe  such  a 
formalisation  is  necessary.  In  its  current  form,  we  believe  the  ontology  provides  useful  insights 
into  the  type  of  information  researchers  should  report  if  we  are  to  understand  fully  the  results  of 
empirical  studies  of  maintenance.  Only  if  the  software  maintenance  community  were  considering 
a  large-scale  database  to  register  empirical  research  results,  would  a  formalised,  fully-evaluated 
ontology  be  necessary. 
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Abstract 

This  paper  addresses  the  problem  of  how  to  produce  reliable  software  that  is  also  flexible  and  cost 
effective  for  the  DoD  distributed  software  domain.  DoD  software  systems  fall  into  two 
categones:  information  systems  and  war  fighter  systems.  Both  types  of  systems  can  be  distributed, 
heterogeneous  and  network-based,  consisting  of  a  set  of  components  running  on  different 
platforms  and  working  together  via  multiple  communication  links  and  protocols.  We  propose  to 
tackle  the  problem  using  prototyping  and  a  “wrapper  and  glue”  technology  for  interoperability 
and  integration.  This  paper  describes  a  distributed  development  environment,  CAPS  (Computer- 
Aided  Prototyping  System),  to  support  rapid  prototyping  and  automatic  generation  of  wrapper 
and  glue  software  based  on  designer  specifications.  The  CAPS  system  uses  a  fifth-generation 
prototyping  language  to  model  the  communication  structure,  timing  constraints,  I/O  control  and 
data  buffering  that  comprise  the  requirements  for  an  embedded  software  system.  The  language 
supports  the  specification  of  hard  real-time  systems  with  reusable  components  from  domain 
specific  component  libraries.  CAPS  has  been  used  successfully  as  a  research  tool  in  prototyping 
large  war-fighter  control  systems  (e.g.  the  command-and-control  station,  cruise  missile  flight 
control  system,  missile  defense  systems)  and  demonstrated  its  capability  to  support  the 

development  of  large  complex  embedded  software. 

1 .  Introduction 

DoD  software  systems  are  currently  categorized  into  Management  Information  Systems  (MIS) 
and  War  Fighter/Embedded  Real-time  Systems.  Both  types  of  systems  can  be  distributed 
heterogeneous  and  network-based,  consisting  of  a  set  of  subsystems,  running  on  different 
platforms  that  work  together  via  multiple  communication  links  and  protocols.  This  paper 
addresses  the  problem  of  how  to  produce  reliable  software  that  is  also  flexible  and  cost  effective 
for  the  DoD  distributed  software  system  domain,  as  depicted  in  the  shaded  area  in  Figure  1. 


‘This  research  was  supported  in  part  by  the  U.  S.  Army  Research  Office  under  contract/grant  number 
35037-MA  and  40473-MA. 
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^  MIS  COTS/GOTS  Components/Subsystems 
♦  War  Fighter/Embedded  Real-time  Components/Subsystems 


Components/Subsystems  communicating  over  a  heterogeneous 
network  under  strict  timing  constraints.  For  example,  future 
C4ISR 


Figure  1.  DoD  Computer-based  systems 


Many  DoD  information  systems  are  COTS/GOTS  based  (commercial/govemment  off-the-shelf, 
mcluding  “legacy  systems”).  While  using  individual  COTS/GOTS  components  saves  DoD  money,’ 
it  shifts  problems  from  software  development  to  software  integration  and  interoperability.  It  is  a 
common  belief  that  interoperability  problems  are  caused  by  incompatible  interface  and  data 
formats,  and  can  be  fixed  “easily”  using  interface  converters  and  data  formatters.  However,  the 
real  challenges  in  fixing  interoperability  problems  are  incompatible  data  interpretations, 
inconsistent  assumptions,  requirement  extensions  triggered  by  global  integration  issues,  and  timely 
data  communication  between  components.  Many  DoD  information  systems,  especially  C4ISR 
systems,  operate  under  tight  tuning  constraints.  Builders  of  COTS/GOTS  based  systems  have  no 
control  over  the  network  on  which  components  communicate.  They  have  to  work  with  available 
infrastructure  and  need  tools  and  methods  to  assist  them  in  making  correct  design  decisions  to 
integrate  COTS/GOTS  components  into  a  distributed  network  based  system.  Similar  integration 
and  interoperability  problems  are  common  in  the  commercial  sector,  and  real-time  issues  are  a 
growing  concern.  For  example,  just-in-time  manufacturing,  on-demand  accounting,  and  factory 
automation  all  involve  timing  requirements.  Although  software  engineers  have  more  control  over 
interfaces  and  data  compatibility  between  individual  components  of  war  fighter  systems,  they 
encounter  similar  data  communication  problems  when  they  need  to  connect  these  components  via 
heterogeneous  networks. 

We  can  tackle  the  problem  using  prototyping  and  a  “wrapper  and  glue”  technology  for 
interoperability  and  integration.  Our  approach  is  based  on  a  distributed  architecture  where 
components  collaborate  via  message  passing  over  heterogeneous  networks.  It  uses  a  generic 
interface  that  allows  system  designers  to  specify  communication  and  operating  requirements 
between  components  as  parameters,  based  on  properties  of  COTS/GOTS  components.  A  separate 
parameterized  model  of  network  characteristics  constrains  the  concrete  “glue”  software  generated 
for  each  node.  The  model  enables  partial  specification  of  requirements  by  the  system  designers, 
and  allows  them  to  explore  design  alternatives  and  determine  missing  parameters  via  rapid 
prototyping. 
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2.  The  Wrapper  and  Glue  Approach 


The  cornerstone  of  our  approach  is  automatic  generation  of  wrapper  and  glue  software  based  on 
designer  specifications.  This  software  bridges  interoperability  gaps  between  individual 
COTS/GOTS  components.  Wrapper  software  provides  a  common  message-passing  interface  for 
components  that  frees  developers  from  the  error  prone  tasks  of  implementing  interface  and  data 
conversion  for  individual  components.  The  glue  software  schedules  time-constrained  actions  and 
carries  out  the  actual  communication  between  components.  (See  Figure  2) 


/  A  Missile  Guidance 


wrapper 
K 2  i  glue 


"  different  kinds  of 
,  communication 
links 


System  running  on  a 
QNX  RTOS 

M  C  based 
navigation 
software 


Network  service 


A  Theater  Defense 
Simulation  System 
running  on  a  UNIX  OS 


Network  service 


Figure  2.  The  wrapper  and.glue  software 


Our  glue-and-wrapper  approach  uses  rapid  prototyping  and  automated  software  synthesis  to 
improve  reliability.  It  differs  from  proxy  and  broker  patterns  in  the  object-oriented  design 
literature  [4]  in  that  it  provides  a  formal  model  to  support  hardware/software  co-design.  Existing 
pattern  approaches  focus  on  low  level  data  transfer  issues.  Our  approach  allows  system  designers 
to  concentrate  on  the  difficult  interoperability  problems  and  issues,  while  freeing  them  from 
implementation  details.  Prototyping  with  engineering  decision  support  can  help  identify  and 
resolve  requirements  conflicts  and  semantic  incompatibilities. 


Glue  code  works  on  two  levels.  It  controls  the  orderly  execution  of  components  within  a 
subsystem,  and  ensures  the  timely  delivery  of  information  between  components  across  a  network. 
Automated  generation  of  glue  code  depends  on  automated  local  and  distributed  scheduling  of 
actions  on  heterogeneous  computing  platforms.  Identifying  timing  constraint  conflicts  and 
assessing  constraint  feasibility  are  critical  in  designing  and  constructing  real-time  software  quickly. 
Checking  whether  a  set  of  timing  and  task  precedence  constraints  can  be  met  on  a  chosen 

hardware  configuration  is  known  to  be  a  difficult  problem.  Computer  aid  is  needed  in  tackling 
such  problem. 
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3.  The  Computer  Aided  Prototyping  System  (CAPS) 


The  value  of  computer  aided  prototyping  in  software  development  is  clearly  recognized.  It  is  a 
very  effective  way  to  gain  understanding  of  the  requirements,  reduce  the  complexity  of  the 
problem  and  provide  an  early  validation  of  the  system  design.  Bernstein  estimated  that  for  every 
dollar  invested  in  prototyping,  one  can  expect  a  $1.40  return  within  the  life  cycle  of  the  system 
development  [1],  To  be  effective,  prototypes  must  be  constructed  tod  modified  rapidly, 
accurately,  and  cheaply  [8].  Computer  aid  for  rapidly  and  inexpensively  constructing  and 
modifying  prototypes  makes  it  feasible  [10].  The  Computer-Aided  Prototyping  System  (CAPS),  a 
research  tool  developed  at  the  Naval  Postgraduate  School,  is  an  integrated  set  of  software  tools 
that  generate  source  programs  directly  from  high  level  requirements  specifications  [7]  (Figure  3). 

It  provides  the  following  kinds  of  support  to  the  prototype  designer: 

(1)  timing  feasibility  checking  via  the  scheduler, 

(2)  consistency  checking  and  automated  assistance  for  project  planning,  configuration 
management,  scheduling,  designer  task  assignment,  and  project  completion  date 
estimation  via  the  Evolution  Control  System, 

(3)  computer-aided  design  completion  via  the  editors, 

(4)  computer-aided  software  reuse  via  the  software  base,  and 

(5)  automatic  generation  of  wrapper  and  glue  code. 

The  efficacy  of  CAPS  has  been  demonstrated  in  many  research  projects  at  the  Naval  Postgraduate 
School  and  other  facilities. 


Figure  3.  The  CAPS  Rapid  Prototyping  Environment 
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3.1  Overview  of  the  Caps  Method 


There  are  four  major  stages  in  the  CAPS  rapid  prototyping  process:  software  system  design, 
construction,  execution,  and  requirements  evaluation/modification  (Figure  4). 


Figure  4.  Iterative  Prototyping  Process  in  CAPS 


The  initial  prototype  design  starts  with  an  analysis  of  the  problem  and  a  decision  about  which 
parts  of  the  proposed  system  are  to  be  prototyped.  Requirements  for  the  prototype  are  then 
generated,  either  informally  (e.g.  English)  or  in  some  formal  notation.  These  requirements  may  be 
refined  by  asking  users  to  verify  their  completeness  and  correctness. 

After  some  requirements  analysis,  the  designer  uses  the  CAPS  PSDL  editor  to  draw  dataflow 
diagrams  annotated  with  nonprocedural  control  constraints  as  part  of  the  specification  of  a 
hierarchically  structured  prototype,  resulting  in  a  preliminary,  top-level  design  free  from 
programming  level  details.  The  user  may  continue  to  decompose  any  software  module  until  its 

components  can  be  realized  via  reusable  components  drawn  from  the  software  base  or  new  atomic 
components. 

This  prototype  is  then  translated  into  the  target  programming  language  for  execution  and 
evaluation.  Debugging  and  modification  utilize  a  design  database  that  assists  the  designers  in 
managing  the  design  history  and  coordinating  change,  as  well  as  other  tools  shown  in  Figure  3. 

3.2  CAPS  as  a  Requirements  Engineering  Tool 

The  requirements  for  a  software  system  are  expressed  at  different  levels  of  abstraction  and  with 
different  degrees  of  formality.  The  highest  level  requirements  are  usually  informal  and  imprecise 
but  they  are  understood  best  by  the  customers.  The  lower  levels  are  more  technical,  precise,  and 
better  suited  for  the  needs  of  the  system  analysts  and  designers,  but  they  are  further  removed  from 
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the  user's  experiences  and  less  well  understood  by  the  customers.  Because  of  the  differences  in  the 
kinds  of  descriptions  needed  by  the  customers  and  developers,  it  is  not  likely  that  any  single 
representation  for  requirements  can  be  the  “best”  one  for  supporting  the  entire  software 
development  process.  CAPS  provides  the  necessary  means  to  bridge  the  communication  gap 
between  the  customers  and  developers.  The  CAPS  tools  are  based  on  the  Prototype  System 
Description  Language  (PSDL),  which  is  designed  specifically  for  specifying  hard  real-time 
systems  [5,  6],  It  has  a  rich  set  of  timing  specification  features  and  offers  a  common  baseline  from 
which  users  and  software  engineers  describe  requirements.  The  PSDL  descriptions  of  the 
prototype  produced  by  the  PSDL  editor  are  very  formal,  precise  and  unambiguous,  meeting  the 
needs  of  the  system  analysts  and  designers.  The  demonstrated  behavior  of  the  executable 
prototype,  on  the  other  hand,  provides  concrete  information  for  the  customer  to  assess  the 
validity  of  the  high  level  requirements  and  to  refine  them  if  necessary. 

3.3  CAPS  as  a  System  Testing  and  Integration  Tool 

Unlike  throw-away  prototypes,  the  process  supported  by  CAPS  provides  requirements  and 
designs  in  a  form  that  can  be  used  in  construction  of  the  operational  system.  The  prototype 
provides  an  executable  representation  of  system  requirements  that  can  be  used  for  comparison 
during  system  testing.  The  existence  of  a  flexible  prototype  can  significantly  ease  system  testing 
and  integration.  When  final  implementations  of  subsystems  are  delivered,  integration  and  testing 
can  begin  before  all  of  the  subsystems  are  complete  by  combining  the  final  versions  of  the 
completed  subsystems  with  prototype  versions  of  the  parts  that  are  still  being  developed. 

3.4  CAPS  as  an  Acquisition  Tool 

Decisions  about  awarding  contracts  for  building  hard  real-time  systems  are  risky  because  there  is 
little  objective  basis  for  determining  whether  a  proposed  contract  will  benefit  the  sponsor  at  the 
time  when  those  decisions  must  be  made.  It  is  also  very  difficult  to  determine  whether  a  delivered 
system  meets  its  requirements.  CAPS,  besides  being  a  useful  tool  to  the  hard  real-time  system 
developers,  is  also  very  useful  to  the  customers.  Acquisition  managers  can  use  CAPS  to  ensure 
that  acquisition  efforts  stay  on  track  and  that  contractors  deliver  what  they  promise.  CAPS 
enables  validation  of  requirements  via  prototyping  demonstration,  greatly  reducing  the  risk  of 
contracting  for  real-time  systems. 

3.5  A  Platform  Independent  User  Interface 

The  current  CAPS  system  provides  two  interfaces  for  users  to  invoke  different  CAPS  tools  and  to 
enter  the  prototype  specification.  The  main  interface  (Figure  5)  was  developed  using  the  TAE+ 
Workbench  [11].  The  Ada  source  code  generated  automatically  from  the  graphic  layout  uses 
libraries  that  only  work  on  SUNOS  4.1.X  operating  systems.  The  PSDL  editor  (Figure  6),  which 
allows  users  to  specify  the  prototype  via  augmented  dataflow  diagram,  was  implemented  in  C++ 
and  can  only  be  executed  under  SUNOS  4. 1.X  environments.  A  portable  implementation  of  the 
CAPS  main  interface  and  the  PSDL  editor  was  needed  to  allow  users  to  use  CAPS  to  build  PSDL 
prototypes  on  different  platforms.  We  choose  to  overcome  these  limitations  by  reimplementing 
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the  main  interface  (Figure  7)  and  the  PSDL  editor  (Figure  8) 
language  [2]. 


using  the  Java  programming 
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Figure  5.  Main  Interface  of  CAPS  Release  2.0 
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Figure  7.  Main  Interface  of  the  new  CAPS 


Figure  6.  PSDL  Editor  of  CAPS  Release  2.0 
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Figure  8.  PSDL  Editor  of  the  new  CAPS 


The  new  graphical  user  interface,  called  the  Heterogeneous  Systems  Integrator  (HSI)  is  similar  to 
the  prevous  CAPS.  Users  of  previous  CAPS  versions  will  easily  adapt  ,o  te  nL  ^Tn^ 
are  some  new  features  in  this  implementation,  which  do  not  affect  the  functionality  of  the 
program,  but  provide  a  friendlier  interface  and  easier  use.  The  major  improvement  is  the  addition 

mottle  iTt  °n  n  !  I  Cdit0r-  ThC  trCe  panel  provides  a  better  view  of  the  overall 
prototype  structure  since  all  of  the  PSDL  components  can  be  seen  in  a  hierarchy.  The  user  can 

navigate  through  the  prototype  by  clicking  on  the  names  of  the  components  on  the  tree  panel. 

us,  it  is  possible  to  jump  to  any  level  m  the  hierarchy,  which  was  not  possible  earlier. 

4.  A  Simple  Example:  Prototyping  a  C3I  Workstation 

To  create  a  first  version  of  a  new  prototype,  users  can  select  “New”  from  the  “Prototype”  pull- 

r  7T  0f  ,he  CAPS  T"  ifflerface  (Fi^  9>-  T*  «r  will  ten  be  asked  to  ^  te 
name  of  the  new  prototype  (say  “c3i_system”)  and  the  CAPS  PSDL  editor  will  be  automatically 
invoked  with  a  single  imtial  root  operator  (with  a  name  same  as  that  of  the  prototype). 
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Figure  9.  Creating  a  new  prototype  called  C3I_System 


CAPS  allows  the  user  to  specify  the  requirements  of  prototypes  as  augmented  dataflow  graphs. 
Using  the  drawing  tools  provided  by  the  PSDL  editor,  the  user  can  create  the  top-level  dataflow 
diagram  of  the  c3i_system  prototype  as  shown  in  Figure  10,  where  the  c3i_system  prototype  is 
modeled  by  nine  modules,  communicating  with  each  other  via  data  streams.  To  model  the 
dynamic  behavior  of  these  modules,  the  dataflow  diagram  is  augmented  with  control  and  timing 
constraints.  For  example,  the  user  may  want  to  specify  that  the  weaponsjnterface  module  has  a 
maximum  response  time  of  3  seconds  to  handle  the  event  triggered  by  the  arrival  of  new  data  in 
the  weapon_status_data  stream,  and  it  only  writes  output  to  the  weapon_emrep  stream  if  the 
status  of  the  weapon_status_data  is  damage,  service_required,  or  out_of_ammunition.  CAPS 
allow  the  user  to  specify  these  timing  and  control  constraints  using  the  pop-up  operator  property 
menu  (Figure  1 1),  resulting  in  a  top-level  PSDL  program  shown  in  Figure  12. 

To  complete  the  specification  of  the  c3i_system  prototype,  the  user  must  specify  how  each 
module  will  be  implemented  by  choosing  the  implementation  language  for  the  module  via  the 
operator  property  menu.  The  implementation  of  a  module  can  be  in  either  the  target  programming 
language  or  PSDL.  A  module  with  an  implementation  in  the  target  programming  language  is 
called  an  atomic  operator.  A  module  that  is  decomposed  into  a  PSDL  implementation  is  called  a 
composite  operator.  Module  decomposition  can  be  done  by  selecting  the  corresponding  operator 
in  the  tree-panel  on  the  left  side  of  the  PSDL  editor. 

CAPS  supports  an  incremental  prototyping  process.  The  user  may  choose  to  implement  all  nine 
modules  as  atomic  operators  (using  dummy  components)  in  the  first  version,  so  as  to  check  out 
the  global  effects  of  the  timing  and  control  constraints.  Then,  he/she  may  choose  to  decompose 
the  comms_interface  module  into  more  detailed  subsystems  and  implement  the  sub-modules  with 
reusable  components,  while  leaving  the  others  as  atomic  operators  in  the  second  version  of  the 
prototype,  and  so  on. 


139 


HJPSDL  Editor  :  c3i_system  psdl 


9  ©c3i_system 

Dl  comms  Jinks 
I  I  weapons  Jntc 
O  comms  Jnter 
O  track  ..databa. 
O  user  Jnterfac 
O  weapons  Jim 
O  sensors  Jnte 
C]  navigation,^ 
I  I  sensors 
t  inpm  Jinkjrne^ 
f  weapon  _statu: 
f  senscr^ciata 
t  posrtion_data 
f  posrtion_data 

f  sensor_add_tr 

f  out  Jracks 
f  tcd„network_s 
f  tdd  JUter 
t  terminate  jtran 

f  tcd_enus5ion, 
t  tcd_arcbrve__s< 
f  ted Jransmrtj: . 

A  r- 


0  ms 


inpu  t_l  ink_me  s  sage 


0  ms 


weapbn_status_data 


track  _/database_Vanager 


position  data 


navigation_sy, 


Save  required 


Figure  10.  Top-level  Dataflow  Diagram  of  the  c3i_system. 
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OPERATOR  c3i_system 
SPECIFICATION 
DESCRIPTION 

{This  module  implements  a  simplified  version  of 
a  generic  C3I  workstation.} 
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Figure  12.  Top-level  Specification  of  the  c3i_system 
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To  facilitate  the  testing  of  the  prototypes,  CAPS  provides  the  user  with  an  execution  support 
system  that  consists  of  a  translator,  a  scheduler  and  a  compiler.  Once  the  user  finishes  specifying 
the  prototype,  he/she  can  invoke  the  translator  and  the  scheduler  from  the  CAPS  main  interface 
to  analyze  the  timing  constraints  for  feasibility  and  to  generate  a  supervisor  module  for  each 
subsystem  of  the  prototype  in  the  target  programming  language.  Each  supervisor  module 
consists  of  a  set  of  driver  procedures  that  realize  all  the  control  constraints,  a  high  priority  task 
(the  static  schedule)  that  executes  the  time-critical  operators  in  a  timely  fashion,  and  a  low 
pnorUy  dynamic  schedule  task  that  executes  the  non-time-critical  operators  when  there  is  time 
available.  The  supervisor  module  also  contains  information  that  enables  the  compiler  to 
incorporate  all  the  software  components  required  to  implement  the  atomic  operators  and 
generate  the  binary  code  automatically.  The  translator/scheduler  also  generates  the  glue  code 
needed  for  timely  delivery  of  information  between  subsystems  across  the  target  network. 

For  prototypes  which  require  sophisticated  graphic  user  interfaces,  the  CAPS  main  interface 
provides  an  interface  editor  to  interactively  sculpt  the  interface.  In  the  c3i_system  prototype  we 
choose  to  decompose  the  commsjnterface,  the  track_database_manager  and  the  userjnterface 
modules  into  subsystems,  resulting  in  hierarchical  design  consisting  of  8  composite  operators  and 
twenty-six  atomic  operators.  The  user  interface  of  the  prototype  has  a  total  of  14  panels,  four  of 
which  are  shown  in  Figure  13.  The  corresponding  Ada  program  has  a  total  of  10.5K  lines  of 
source  code.  Among  the  10.5K  lines  of  code,  3.5K  lines  comes  from  supervisor  module  that  was 
generated  automatically  by  the  translator/scheduler  and  1.7K  lines  that  were  automatically 
generated  by  the  interface  editor  [9]. 

5.  Conclusion 

CAPS  has  been  used  successfully  as  a  research  tool  in  prototyping  large  war-fighter  control 
systems  (e.g.  the  command-and-control  station,  cruise  missile  flight  control  system,  missile 
defense  systems)  and  demonstrated  its  capability  to  support  the  development  of  large  complex 
embedded  software.  Specific  payoffs  include: 

(1)  Formulate/validate  requirements  via  prototype  demonstration  and  user  feedback 

(2)  Assess  feasibility  of  real-time  system  designs 

(3)  Enable  early  testing  and  integration  of  completed  subsystems 

(4)  Support  evolutionary  system  development,  integration  and  testing 

(5)  Reduce  maintenance  costs  through  systematic  code  generation 

(6)  Produce  high  quality,  reliable  and  flexible  software 

(7)  Avoid  schedule  overruns 

In  order  to  evaluate  the  benefits  derived  from  the  practice  of  computer-aided  prototyping  within 
the  software  acquisition  process,  we  conducted  a  case  study  in  which  we  compared  the  cost  (in 
dollar  amounts)  required  to  perform  requirements  analysis  and  feasibility  study  for  the  c3i  system 
using  the  2167A  process,  in  which  the  software  is  coded  manually,  and  the  rapid  prototyping 
process,  where  part  of  the  code  is  automatically  generated  via  CAPS  [3],  We  found  that  even 
^  veiy  conservative  assumptions,  using  the  CAPS  method  resulted  in  a  cost  reduction  of 
$  6,300,  a  27%  cost  saving.  Taking  the  results  of  this  comparison,  then  projecting  to  a  mission 
control  software  system,  the  command  and  control  segment  (CCS),  we  estimated  that  there  would 
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be  a  cost  saving  of  12  million  dollars.  Applying  this  concept  to  an  engineering  change  to  a  typical 
component  of  the  CCS  software  showed  a  further  cost  savings  of  $25,000. 
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Figure  13.  User  Interface  of  the  c3i_system 
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Abstract 

Current  early  risk  assessment  techniques  rely  on  subjective  human  judgments  and 
unrealistic  assumptions  such  as  fixed  requirements  and  work  breakdown  structures.  This  is  a 
weak  approach  because  different  people  could  arrive  at  different  conclusions  from  the  same 
scenario  even  for  projects  with  a  stable  and  well-defined  scope,  and  such  projects  are  rare.  This 
paper  introduces  a  formal  model  to  assess  the  risk  and  the  duration  of  software  projects 
automatically,  based  on  objective  indicators  that  can  be  measured  early  in  the  process.  The 
model  has  been  designed  to  account  for  significant  characteristics  of  evolutionary  software 
processes,  such  as  requirement  complexity,  requirement  volatility  and  organizational  efficiency. 
The  formal  model  based  on  these  three  indicators  estimates  the  duration  and  risk  of  evolutionary 
software  processes.  The  approach  supports  (a)  automation  of  risk  assessment  and,  (b)  early 
estimation  methods  for  evolutionary  software  processes. 

1.  Introduction 


Software  applications  have  grown  in  size  and  complexity  covering  many  human  activities  of 
importance  to  society.  The  report  of  the  President  s  Information  Advisory  Committee  calls 
software  the  new  physical  infrastructure  of  the  information  age  .  Unfortunately,  the  ability  to 
build  software  has  not  increased  proportionately  to  demand  [Hall,  1997.  pp  xv],  and  shortfalls  in 
this  regard  are  a  growing  concern.  According  to  the  Standish  group,  in  1995  84%  of  software 
projects  finished  over  time  or  budget,  and  $80  billion  -  $100  billion  is  spent  annually  on 
cancelled  projects  in  the  US.  Developing  software  is  still  a  high-risk  activity. 

There  have  been  many  approaches  to  improving  this  situation,  mostly  focused  on  increasing 
productivity  via  improvements  in  technology  or  management.  Although  better  productivity  is 
certainly  welcome,  closer  examination  shows  that  these  efforts  address  only  half  of  the  problem. 
A  project  gets  over  time  or  over  budget  if  actual  performance  does  not  match  estimates.  Current 
estimation  techniques  are  far  from  reliable,  and  tend  to  systematically  produce  overly  optimistic 
estimates.  More  accurate  early  estimates  could  help  reduce  wasted  resources  associated  with 
overruns  and  cancelled  projects  in  two  ways:  if  costs  are  known  to  be  too  high  at  the  outset,  the 
scope  of  the  project  could  be  reduced  to  enable  completion  within  time  and  budget,  or  it  could 
be  cancelled  before  it  starts,  and  instead  the  resources  could  be  used  to  successfully  complete 
other  feasible  projects. 

This  paper  therefore  focuses  on  improved  risk  assessment  for  software  projects.  We  address 
project  risks  related  to  schedule  and  budget,  and  focus  mostly  on  completion  time  of  the  project. 
Current  risk  assessment  standards  are  weak  because  they  rely  on  subjective  human  expertise, 
assume  frozen  requirements,  or  depend  on  metrics  difficult  to  measure  until  it  is  too  late.  This 
paper  describes  a  formal  risk  assessment  model  based  on  metrics  and  sensitive  to  requirements 
volatility.  Further  details  can  be  found  in  [Nogueira  2000].  The  model  is  specially  suited  for 
evolutionary  prototyping  and  incremental  software  development. 

Section  2  defines  the  problem  we  are  addressing.  Section  3  analyzes  relevant  previous  work. 
Section  4  presents  and  evaluates  our  project  risk  model.  Section  5  outlines  how  systematic  risk 
assessment  fits  into  iterative  prototyping.  Section  6  concludes. 
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2.  The  Problem 


As  the  range  and  complexity  of  computer  applications  have  grown,  the  cost  of  software 
development  has  become  the  major  expense  of  computer-based  systems  [Boehm  19811 
f  I95*6]-  Research  shows  that  in  private  industry  as  well  as  in  government  environments' 
schedule  and  cost  overruns  are  tragically  common  [Luqi  1989,  Jones  1994,  Boehm  19811.’ 
Despite  improvements  in  tools  and  methodologies,  there  is  little  evidence  of  success  in 
improving  the  process  of  moving  from  the  concept  to  the  product,  and  little  progress  has  been 
made  m  managmg  software  development  projects  [Hall,  1997],  Research  shows  that  45  percent 
of  all  the  causes  for  delayed  software  deliveries  are  related  to  organizational  issues 
[vanGenuchten  1991],  A  study  published  by  the  Standish  Group  reveals  that  the  number  of 
software  projects  that  fail  has  dropped  from  40%  in  1997  to  26%  in  1999.  However  the 
percentage  of  projects  with  cost  and  schedule  overruns  rose  from  33%  in  1997  to  46%  in  1999 
[Reel  1999]. 


Despite  the  recent  improvements  introduced  in  software  processes  and  automated  tools  risk 
assessment  for  software  projects  remains  an  unstructured  problem  dependent  on  human 
expertise  [Boehm  1988,  Hall  1997],  The  acquisition  and  development  communities,  both 
governmental  and  industrial,  lack  systematic  ways  of  identifying,  communicating  and  resolving 
technical  uncertainty  [SEI  1996],  B 


^  This  paper  explores  ways  to  transform  risk  assessment  into  a  structured  problem  with 
systematic  solutions.  Constructing  a  model  to  assess  risk  based  on  objectively  measurable 
parameters  that  can  be  automatically  collected  and  analyzed  is  necessary.  Solving  the  risk 
assessment  problem  with  indicators  measured  in  the  early  phases  would  constitute  a  great 
benefit  to  software  engineering.  In  these  early  phases,  changes  can  be  made  with  the  least 
impact  on  the  budget  and  schedule.  The  requirements  phase  is  the  crucial  stage  to  assess  risk 
because:  a)  it  involves  a  huge  amount  of  human  interaction  and  communication  that  can  be 
misunderstood  and  can  be  a  source  of  errors;  b)  errors  introduced  at  this  phase  are  very 
expensive  to  correct  if  they  are  discovered  late;  c)  the  existence  of  software  generation  tools  can 
diminish  the  errors  in  the  development  process  if  the  requirements  are  correct;  and  d) 
requirements  evolve  introducing  changes  and  maintenance  along  the  whole  life  cycle. 

Part  of  the  problem  is  misinterpreting  the  importance  of  risk  management.  It  is  usually  and 
incorrectly  viewed  as  an  additional  activity  layered  on  the  assigned  work,  or  worse,  as  an 
outside  activity  that  is  not  part  of  the  software  process  [Hall  1997,  Karolak  1996]  One  of  the 
goals  of  our  research  is  to  integrate  a  risk  assessment  model  with  previous  research  on  CAPS2  at 

S  [Ham  99],  This  integration  is  required  in  order  to  capture  metrics  automatically  in  the 
context  of  a  modem  evolutionary  prototyping  and  software  development  process.  This  should 
provide  project  managers  with  a  more  complete  tool  that  can  enable  improved  risk  assessment 
without  interfering  with  the  work  of  a  project  s  software  engineers. 

A  second  source  of  problems  in  risk  management  is  the  lack  of  tools  [Karolak  1996],  The 
mam  reason  for  this  lack  of  tools  is  that  risk  assessment  is  apparently  an  unstructured  problem 
To  systematize  unstructured  problems  it  is  necessary  to  define  structured  processes.  Structured 
processes  involve  routine  and  repetitive  problems  for  which  a  standard  solution  exists. 
Unstructured  processes  require  decision-making  based  on  a  three-phase  method  (intelligence 
design,  choice)  [Turban  et  al  1998].  An  unstructured  problem  is  one  in  which  none  of  the  three 
phases  is  structured.  Current  approaches  to  risk  management  are  highly  sensitive  to  managers 
perceptions  and  preferences,  which  are  difficult  to  represent  by  an  algorithm.  Depending  on  the 
decision-maker’s  attitude  towards  risk,  he  or  she  can  decide  early  with  little  information,  or  can 
postpone  the  decision,  gaining  time  to  obtain  more  information,  but  losing  some  control. 

A  third  source  of  risk  management  problems  is  the  contusion  created  by  the  informal  use  of 
terms.  Often,  the  software  engineering  community  (and  most  parts  of  the  project  management 


CAPS  stands  for  Computer  Aided  Prototyping  System  [Luqi  1988]. 
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community  [Wideman  1992])  uses  the  term  "risk"  casually.  This  term  is  often  used  to  describe 
different  concepts.  It  is  erroneously  used  as  a  synonym  of  "uncertainty"  and  "threat"  [SEI  1996, 
Hall  1997,  Karolak,  1996].  Generally,  software  risk  is  viewed  as  a  measure  of  the  likelihood  of 
an  unsatisfactory  outcome  and  a  loss  affecting  the  software  from  different  points  of  view: 
project,  process,  and  product  [Hall  1997,  SEI  1996],  However,  this  definition  of  risk  is 
misleading  because  it  confounds  the  concepts  of  risk  and  uncertainty.  In  general,  most  parts  of 
decision-making  in  software  processes  are  under  uncertainty  rather  than  under  risk.  Uncertainty 
is  a  situation  in  which  the  probability  distribution  for  the  possible  outcomes  is  not  known. 

In  this  paper  the  term  "risk"  is  reserved  to  indicate  the  probabilistic  outcome  of  a  succession 
of  states  of  nature,  and  the  term  "threat"  is  used  to  identify  the  dangers  that  can  occur.  We 
define  risk  to  be  the  product  of  the  value  of  an  outcome  times  its  probability  of  occurrence.  This 
outcome  could  be  either  positive  (gain)  or  negative  (loss).  This  abstraction  permits  one  to 
address  not  only  the  classical  risk  management  issue,  but  also  to  discover  opportunities  leading 
to  competitive  advantage. 

We  address  the  issue  of  risk  assessment  by  estimating  the  probability  distribution  for  the 
possible  outcomes  of  a  project,  based  on  observed  values  of  metrics  that  can  be  measured  early 
in  the  process.  The  metrics  were  chosen  based  on  a  causal  analysis  to  identify  the  most 
important  threats  and  a  statistical  analysis  to  choose  the  shape  of  the  probability  distribution  and 
relate  its  parameters  to  readily  measurable  metrics. 

3.  Related  Work 

There  are  three  main  groups  of  research  related  to  risk: 

•  Assessing  Software  Risk  by  Measuring  Reliability.  This  group  follows  a  probabilistic 
approach  and  has  successfully  assessed  the  reliability  of  the  product  [Lyu  1995, 
Schneidewind  1975,  Musa  1998].  However,  this  approach  addresses  the  reliability  of  the 
product,  not  the  risk  of  failing  to  complete  the  project  within  budget  and  schedule 
constraints.  These  approaches  could  be  used  to  assess  risks  related  to  failures  of  software 
projects,  which  are  outside  the  scope  of  the  current  paper.  A  concern  with  these  approaches 
is  that  the  resulting  assessments  arrive  too  late  to  economically  correct  possible  faults, 
because  the  software  product  is  mostly  complete  and  development  resources  are  mostly 
gone  at  the  time  when  reliability  of  the  product  can  be  assessed  by  testing. 

•  Heuristic  approaches:  Other  researchers  assess  the  risk  from  the  beginnings  in  parallel 
with  the  development  process.  However,  these  approaches  are  less  rigorous,  typically 
subjective  and  weakly  structured.  Basically  these  approaches  use  lists  of  practices  and 
checklists  [SEI,  1996,  Hall  1997,  Charette  1997,  Jones  1994]  or  scoring  techniques  [Karolak 
1996],  Paradoxically,  SEI  defines  software  technical  risk  as  a  measure  of  the  probability  and 
severity  of  adverse  effects  in  developing  software  that  does  not  meet  its  intended  functions 
and  performance  requirements  [SEI,  1996],  However,  the  term  "probability"  is  misleading 
in  this  case  because  the  probability  distribution  is  unknown. 

•  Macro  Model  Approaches:  A  third  group  of  researchers  uses  well  known  estimation 
models  to  assess  how  risky  a  project  could  be.  The  widely  used  methods  COCOMO 
[Boehm  1981],  and  SLIM  [Putnam,  1980]  both  assume  that  the  requirements  will  remain 
unchanged,  and  require  an  estimation  of  the  size  of  the  final  product  as  input  for  the  models 
[Londeix  1987],  This  size  cannot  be  actually  measured  until  late  in  the  project. 

The  standard  tools  used  to  control  all  types  of  projects,  including  PERT,  CPM,  and  Gantt, 
do  not  consider  coordination  and  communication  overhead.  Such  models  represent  sequential 
interdependencies  through  explicit  representation  of  precedence  relationships  between  activities. 
This  simplified  vision  of  a  project  cannot  address  the  dynamics  created  by  reciprocal 
requirements  of  information  in  concurrent  activities,  exception  management,  and  the  impact  of 


actor  interactions.  Since  the  missing  factors  increase  time  requirements,  the  estimates  resulting 
trom  these  generic  project  estimation  models  are  overly  optimistic. 

These  issues  are  addressed  by  Vit  Project  [Levitt  1999,  Thomsen  et  al.  1999],  Vit  Project  is 
applicable  to  projects  in  which  a)  all  activities  in  the  project  can  be  predefined-  b)  the 
organization  is  static,  and  all  activities  are  pre-assigned  to  actors  in  the  static  organization-  c)  the 
exceptions  to  activities  result  in  extra  work  volume  for  the  predefined  activities  and  are  carried 
out  by  the  pre-assigned  actors;  and  d)  actors  are  assumed  to  have  congruent  goals.  The  model  is 
well  suited  for  simulating  organizations  that  deal  with  great  amounts  of  information  processing 
and  wordmution.  Such  characteristics  are  extremely  relevant  in  software  processes  [Boehm 
1981],  However,  this  approach  requires  a  fixed  work  breakdown  structure,  and  therefore  does 

not  apply  at  the  early  stages  when  requirements  are  changing  and  the  set  of  tasks  comprising  the 
project  are  still  uncertain.  a 

By  using  informal  risk  assessment  models,  using  estimation  models  based  on  optimistic 
assumptions  that  require  parameters  difficult  to  provide  until  late,  and  using  optimistic  project 
control  tools,  project  managers  condemn  themselves  to  overrun  schedules  and  cost. 

4.  The  Proposed  Project  Risk  Model 

Our  approach  is  based  on  metrics  automatically  collectable  from  the  engineering  database 

n?°\A  ^  b6g™g  °f  thC  devel°Pment-  The  indicators  used  are  Requirements  Volatility 

(RV),  Complexity  (CX),  and  Efficiency  (EF).  y 


tvI’Uv'T"! .(RV):  RV  is  a  measure  of  three  characteristics  of  the  requirements:  a)  the 
rJirth-Kate  (BR),  that  is  the  percentage  of  new  requirements  incorporated  in  each  cycle  of  the 
evolution  process;  b)  the  Death-Rate  (DR),  that  is  the  percentage  of  requirements  dropped  in 
each  cycle;  and  c)  the  Change-Rate  (CR)  defined  as  the  percentage  of  requirements  changed 
from  the  previous  version.  A  change  m  one  requirement  is  modeled  as  a  birth  of  a  new 

npTy  ent  the  death  of  another>  so  that  CR  is  included  in  the  measured  values  of  BR  and 
DR.  RV  is  calculated  as  follows:  RV  =  BR  +  DR. 

Complexity  (CX):  Complexity  of  the  requirements  is  measured  from  a  formal  specification.  A 
ioqT"1™15  representation  that  supports  computer-aided  prototyping,  such  as  PSDL  [Luqi 
19%]  is  useful  in  the  context  of  evolutionary  prototyping.  We  define  a  complexity  metric 
called  Large  Granularity  Complexity  (LGC)  that  is  calculated  as  follows:  LGC  =  O  +  D  +  T, 
where  for  PSDL  O  is  the  number  of  atomic  operators  (functions  or  state  machines),  D  is  the 
number  of  atomic  data  streams  (data  connections  between  operators),  and  T  is  the  number  of 
abstract  data  types  required  for  the  system.  Operators  and  data  streams  are  the  components  of  a 
dataflow  graph.  This  is  a  measure  of  the  complexity  of  the  prototype  architecture,  similar  in 
spirit  to  function  points  but  more  suitable  for  modeling  embedded  and  real-time  systems.  The 
measure  can  also  be  applied  to  other  modeling  notations  that  represent  modules,  data 
connections,  and  abstract  data  types  or  classes.  We  found  a  strong  correlation  between  the 

Sw?  measured  in  LGC  and  the  size  of  PSDL  specifications  (correlation  coefficient  R  = 
0.996).  Most  important,  we  also  found  a  strong  correlation  (R  =  0.898)  between  the  complexity 
measured  in  LGC  and  the  size  of  the  final  product  expressed  in  non-comment  lines  of  Ada  code 
including  both  the  code  automatically  created  by  the  generator  and  the  code  manually 
introduced  by  the  programmers. 

Efficiency  (EF):  The  efficiency  of  the  organization  is  measured  using  a  direct  observation  of  the 
use  °f  time^EF  is  calculated  as  a  ratio  between  the  time  dedicated  to  direct  labor  and  the  idle 
tune:  EF  -  Direct  Labor  Time  /  Idle  Time.  We  found  that  this  easily  measurable  quantity  was  a 
good  discriminator  between  high  team  productivity  and  low  team  productivity  in  a  set  of 
simulated  software  projects  [Nogueira  2000]. 
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We  validated  and  calibrated  our  model  with  a  series  of  simulated  software  projects  using 
Vit  Project.  This  tool  was  chosen  because  of  the  inclusion  of  communications  and  exceptions  in 
its  project  dynamics  model,  and  because  it  has  been  extensively  validated  for  many  types  of 
engineering  projects,  including  software  engineering  projects.  The  input  parameters  for  the 
simulated  scenarios  were  RV,  EF  and  CX,  and  the  observed  output  was  the  development  time. 

Given  that  the  proposed  model  uses  parameters  collected  during  the  early  phases  and  given  that 
Vit  Project  requires  a  complete  breakdown  structure  of  the  project,  which  can  be  done  only  in 
the  late  phases,  there  was  a  considerable  time  gap  between  the  two  measurements.  This  time  gap 
is  less  than  for  a  post-mortem  analysis,  but  it  is  sufficient  for  model  calibration  and  validation 
purposes. 

The  simulation  results  were  analyzed  statistically,  with  the  finding  that  the  Weibull 
probability  distribution  was  the  best  fit  for  all  the  samples.  A  random  variable  x  is  said  to  have  a 
Weibull  distribution  with  parameters  a,  P  and  y  (with  a  >  0,  (3  >  0)  if  the  probability  distribution 
function  (pdf)  and  cumulative  distribution  function  (cdf)  of  x  are  respectively: 

f  0,  x  <  y 

pdf:  f(x;  a,  (3,  y)  =  \ 

(a/{3“)  (x  -y)“'1  exp(-((x  -  y)/(3)“),  x  >  y 

f  0,  x  <  y 

cdf:  F(x;  a,  (3,  y)  =  { 

1  —  exp(-((x  y)  /  P)  a)  x  >  y. 

The  random  variable  under  study,  x,  can  be  interpreted  as  development  time  in  our  context. 

The  shape  parameter  a  controls  the  skew  of  the  pdf,  which  is  not  symmetric.  We  found  that  this 
is  mostly  related  to  the  efficiency  of  the  organization  (EF).  The  scale  parameter  P  stretches  or 
compresses  the  graph  in  the  *  direction.  We  found  that  this  parameter  is  related  to  the  efficiency 
(EF),  requirements  volatility  (RV),  and  complexity  (CX)  measured  in  LGC.  The  shifting 
parameter  y  is  shifts  the  origin  of  the  curves  to  the  right.  We  found  that  it  is  mostly  related  to  the 
complexity  measured  in  LGC. 

Based  on  best  fit  to  our  simulation  results,  the  model  parameters  can  be  derived  from  the 
project  metrics  using  the  following  algorithm: 

If  (EF  >  2.0)  then  a  =  1.95; 

y  =  22  *  0.32* (13*ln (LGC) —82 ) ; 

P  =  y  /(5.71+(RV-20)*0.046); 
else  a  =  2.5; 

y  =  22  *  0 . 85* (13* In (LGC)— 82) ; 

P  =  y  / (5.47- (RV-20) *0 . 114) ; 
end  if; 

The  model  estimates  the  following  cumulative  probability  distribution  for  project  completion  on 
or  before  time  x: 

P(x)  =  1  -  exp  ( -  ( ( (x  -  y)/P)“))  //  where  x  is  time  in  days 

This  equation  can  be  inverted  to  obtain  the  schedule  length  needed  to  have  a  probability  P  of 
completing  within  schedule,  with  the  following  result. 

x  =  y  +  p  ( -  In  (1— P) ) 1/a 

The  probability  P  can  be  interpreted  as  a  degree  of  confidence  in  the  ability  of  the  project  to 
successfully  complete  within  a  schedule  of  length  x.  Applying  the  above  equation  to  estimate 
the  development  time  needed  for  a  95%  chance  of  completion  within  schedule  for  16  different  149 


scenarios  simulated  using  Vit  Project,  we  observed  a  standard  error  of  22  days  The  worst  case 
was  an  error  of  60  days  for  a  project  of  520  days  (12%).  The  comparison  of  estimated  time  and 
simulated  time  is  shown  below. 


700 

•g  600  - 

T 

<lT 

•I  500  - 

- - - - - - - - 1 - — i 

I 

g 

Z  400  - 

CL 

B 

8  300  -1 

- - - - r _ 

o 

0> 

•§*  200  - 

.  r; _ 

•  t=  duration 

♦  estimated  — • 

CL 

1  100  - 

- - - - 

s 

m  0  - 

♦«» 

i  i - 1 - j 

0  1 00  200  300  400  500  600  700 

Simulated  project  completion  time,  daydays 

5.  Integrating  Risk  Assessment  into  Prototyping 


The  model  presented  in  the  previous  section  is  designed  to  support  an  iterative  prototyping 
and  software  development  process.  In  this  process,  an  initial  problem  statement,  a  prototype 
emo  or  problem  reports  from  a  deployed  software  product  trigger  an  issue  analysis,  followed 
by  formulation  of  proposed  requirements  changes,  and  specification  of  a  proposed  adjustment  to 
the  software  requirements,  which  can  be  initially  empty.  At  this  point  in  each  cycle,  the  project 
manager  should  perform  a  risk  assessment  step.  The  results  of  the  risk  assessment  step  guide  the 
egree  of  detail  to  which  requirements  enhancements  are  demonstrated,  and  the  set  of 
requirements  issues  to  be  considered  in  the  next  prototyping  cycle,  if  any. 
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The  first  measurement-based  risk  assessment  step  can  be  performed  after  specification  of 
the  first  version  of  the  prototype  architecture,  based  on  the  requirements  volatility,  LGC  and 
efficiency  measurements  from  the  steps  just  performed. 


In  cases  where  risk  assessments  are  required  even  earlier,  before  any  prototyping  has  been 
done,  estimates  of  team  efficiency  and  requirements  volatility  can  be  based  on  measurements  of 
similar  past  projects,  and  initial  complexity  estimates  can  be  based  on  subjective  guesswork  of 
the  kind  currently  used  in  the  macro  model  approaches.  This  kind  of  estimate  may  be  less 
reliable  than  those  based  solely  on  measurements,  but  it  can  provide  a  principled  and  reasonably 
accurate  basis  for  deciding  whether  or  not  to  start  a  prototyping  process  to  determine  the 
requirements  for  a  proposed  development  project.  Thus  parts  of  our  approach  can  be  used  truly 
at  the  very  beginning  of  the  process. 

If  a  prototyping  effort  is  approved,  early  measurements  of  the  process  could  be  used  to 
refine  the  initial  estimates  of  the  model  parameters  using  Bayesian  methods,  thus  providing  a 
balanced  and  systematic  transition  from  subjective  guesswork,  coded  as  an  a  priori  distribution, 
to  assessments  increasingly  based  on  systematic  measurement.  Such  an  approach  also  supports 
incorporation  and  systematic  refinement  of  measurements  from  previous  cycles  of  the  iterative 
prototyping  process. 

The  results  of  risk  assessment  can  provide  guidance  on  the  degree  to  which  the  project  can 
afford  to  explore  requirements  enhancements  requested  by  the  customers.  It  can  also  help 
customers  or  marketing  departments  to  decide  how  much  they  really  want  possible 
improvements,  in  the  context  of  the  resulting  time  and  cost  estimates.  Systematic  cost/benefit 
analysis  becomes  possible  only  with  the  availability  of  reasonably  accurate  estimates. 

The  risk  assessment  step  can  thus  provide  a  balancing  force  to  stabilize  the  requirements 
formulation  process.  In  the  absence  of  information  on  how  much  potential  enhancements  will 
cost,  stakeholders  are  prone  to  unrealistic  requirements  amplification  —  of  course  they  would 
always  like  to  have  a  better  system,  no  matter  how  good  the  existing  one  is,  if  you  do  not  ask 
them  to  pay  for  the  improvements.  The  proposed  risk  assessment  steps  can  provide  a  realistic 
basis  for  incorporating  time  and  cost  constraints  and  cost/benefit  tradeoffs  early  in  the  process, 
when  the  situation  is  fluid  and  many  options  are  open. 

This  process  refinement  provides  some  additional  insight  into  the  dynamics  of  iterative 
prototyping:  the  iterative  process  should  stop  when  the  customers  have  determined  what 
requirements  they  can  afford  to  realize,  and  which  of  many  possible  improvements  they  will  be 
willing  to  pay  for,  if  any.  It  is  not  necessarily  the  case  that  the  set  of  criticisms  elicited  by  the 
final  round  of  prototype  demonstrations  is  empty  —  that  is  true  only  in  an  idealized  world  with 
adequate  budgets  and  patient  customers. 

6.  Conclusion 

This  paper  introduces  a  formal  risk  assessment  model  for  software  projects  based  on 
probabilities  and  metrics  automatically  collectable  from  the  project  baseline.  The  approach 
enables  a  project  manager  to  evaluate  the  probability  of  success  of  the  project  very  early  in  the 
life  cycle,  during  an  iterative  requirements  formulation  process,  based  on  well-defined 
measurements  rather  than  just  guesswork  or  subjective  judgments. 

For  more  than  twenty  years,  estimation  standards  have  been  characterized  by  a  common 
limitation:  the  requirements  should  be  frozen  in  order  to  make  estimates.  This  model  presented 
in  this  paper  removes  this  important  limitation,  facing  the  reality  that  requirements  are 
inherently  variable. 

The  model  is  perfectly  suited  for  any  evolutionary  software  process  because  it  follows  the 
same  philosophy.  The  risk  assessment  and  estimation  steps  are  conducted  at  each  evolutionary 
cycle  with  increasing  knowledge  and  decreasing  variance.  The  research  formalizes  an  151 


improvement  m  the  evolutionary  software  process,  introducing  a  risk  assessment  step  that  can 
be  automated,  and  that  can  help  shape  the  planning  of  the  project  in  the  early  stages  when  there 

is  still  substantial  freedom  to  allocate  available  time  and  budget. 
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Abstract 

APT  (Automated  Prototyping  Tool-Kit) 
is  an  integrated  set  of  software  tools  that 
generate  source  programs  directly  from 
real-time  requirements.  The  APT  system 
uses  a  fifth-generation  prototyping 
language  to  model  the  communication 
structure,  timing  constraints,  I/O  control, 
and  data  buffering  that  comprise  the 
requirements  for  an  embedded  software 
system.  The  language  supports  the 
specification  of  hard  real-time  systems 
with  reusable  components  from  domain 
specific  component  libraries.  APT  has 
been  used  successfully  as  a  research  tool 
in  prototyping  large  war-fighter  control 
systems  (e.g.  the  command-and-control 
station,  cruise  missile  flight  control 
system,  patriot  missile  defense  systems) 
and  demonstrated  its  capability  to 
support  the  development  of  large 
complex  embedded  software. 

Keywords:  APT,  Automated 
Prototyping,  Real-Time  Systems, 
Command  and  Control,  Formal  Methods, 
Evolution,  Reuse,  Architecture, 

Components,  PSDL 

1  INTRODUCTION 

Software  project  managers  are 
often  faced  with  the  problem  of  inability 
to  accurately  and  completely  specify 


requirements  for  real-time  software 
systems,  resulting  in  poor  productivity, 
schedule  overruns,  unmaintainable  and 
unreliable  software.  APT  is  designed  to 
assist  program  managers  to  rapidly 
evaluate  requirements  for  military  real¬ 
time  control  software  using  executable 
prototypes,  and  to  test  and  integrate 
completed  subsystems  through 
evolutionary  prototyping.  APT  provides 
a  capability  to  quickly  develop 
functional  prototypes  to  verify  feasibility 
of  system  requirements  early  in  the 
software  development  process.  It 
supports  an  evolutionary  development 
process  that  spans  the  complete  life- 
cycle  of  real-time  software. 

2  THE  AUTOMATED 
PROTOTYPING  TOOL-KIT  (APT) 

The  value  of  computer  aided  prototyping 
in  software  development  is  clearly 
recognized.  It  is  a  very  effective  way  to 
gain  understanding  of  the  requirements, 
reduce  the  complexity  of  the  problem 
and  provide  an  early  validation  of  the 
system  design.  Bernstein  estimated  that 
for  every  dollar  invested  in  prototyping, 
one  can  expect  a  $1.40  return  within  the 
life  cycle  of  the  system  development  [1], 
To  be  effective,  prototypes  must  be 
constructed  and  modified  rapidly, 
accurately,  and  cheaply  [8].  Computer 
aid  for  rapidly  and  inexpensively 
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constructing  and  modifying  prototypes 
makes  it  feasible  [10].  The  Automated 
Prototyping  Tool-kit  (APT),  a  research 
tool  developed  at  the  Naval  Postgraduate 
School,  is  an  integrated  set  of  software 
tools  that  generate  source  programs 
directly  from  high  level  requirements 
specifications  [7]  (Figure  1). 

It  provides  the  following  kinds  of 
support  to  the  prototype  designer: 

(1)  timing  feasibility  checking  via 
the  scheduler, 

(2)  consistency  checking  and 
automated  assistance  for  project 
planning,  configuration 


management,  scheduling, 
designer  task  assignment,  and 
project  completion  date 
estimation  via  the  Evolution 
Control  System, 

(3)  computer-aided  design 
completion  via  the  editors, 

(4)  computer-aided  software  reuse 
via  the  software  base,  and 

(5)  automatic  generation  of  wrapper 
and  glue  code. 

The  efficacy  of  APT  has  been 
demonstrated  in  many  research  projects 
at  the  Naval  Postgraduate  School  and 
other  facilities. 


Figure  1 .  The  APT  Rapid  Prototyping  Environment 
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2. 1  Overview  of  the  APT  Method 


There  are  four  major  stages  in  the  APT 
rapid  prototyping  process:  software 
system  design,  construction,  execution, 
and  requirements  evaluation  and/or 
modification  (Figure  2). 

The  initial  prototype  design  starts  with 
an  analysis  of  the  problem  and  a 
decision  about  which  parts  of  the 
proposed  system  are  to  be  prototyped. 
Requirements  for  the  prototype  are  then 
generated,  either  informally  (e.g. 
English)  or  in  some  formal  notation. 
These  requirements  may  be  refined  by 
asking  users  to  verify  their  completeness 
and  correctness. 

After  some  requirements  analysis,  the 
designer  uses  the  APT  PSDL  editor  to 
draw  dataflow  diagrams  annotated  with 
nonprocedural  control  constraints  as  part 
of  the  specification  of  a  hierarchically 
structured  prototype,  resulting  in  a 
preliminary,  top-level  design  free  from 


programming  level  details.  The  user  may 
continue  to  decompose  any  software 
module  until  its  components  can  be 
realized  via  reusable  components  drawn 
from  the  software  base  or  new  atomic 
components. 

This  prototype  is  then  translated  into  the 
target  programming  language  for 
execution  and  evaluation.  Debugging 
and  modification  utilize  a  design 
database  that  assists  the  designers  in 
managing  the  design  history  and 
coordinating  change,  as  well  as  other 
tools  shown  in  Figure  3. 

2-2  APT  as  a  Requirements 
Engineering  Tool 

The  requirements  for  a  software  system 
are  expressed  at  different  levels  of 
abstraction  and  with  different  degrees  of 
formality.  The  highest  level 
requirements  are  usually  informal  and 
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imprecise,  but  they  are  understood  best 
by  the  customers.  The  lower  levels  are 
more  technical,  precise,  and  better  suited 
for  the  needs  of  the  system  analysts  and 
designers,  but  they  are  further  removed 
from  the  user's  experiences  and  less  well 
understood  by  the  customers.  Because  of 
the  differences  in  the  kinds  of 
descriptions  needed  by  the  customers 
and  developers,  it  is  not  likely  that  any 
single  representation  for  requirements 
can  be  the  “best”  one  for  supporting  the 
entire  software  development  process. 
APT  provides  the  necessary  means  to 
bridge  the  communication  gap  between 
the  customers  and  developers.  The  APT 
tools  are  based  on  the  Prototype  System 
Description  Language  (PSDL),  which  is 
designed  specifically  for  specifying  hard 
real-time  systems  [5,  6].  It  has  a  rich  set 
of  timing  specification  features  and 
offers  a  common  baseline  from  which 
users  and  software  engineers  describe 
requirements.  The  PSDL  descriptions  of 
the  prototype  produced  by  the  PSDL 
editor  are  very  formal,  precise  and 
unambiguous,  meeting  the  needs  of  the 
system  analysts  and  designers.  The 
demonstrated  behavior  of  the  executable 
prototype,  on  the  other  hand,  provides 
concrete  information  for  the  customer  to 
assess  the  validity  of  the  high  level 
requirements  and  to  refine  them  if 
necessary. 

2.3  APT  as  a  System  Testing  and 
Integration  Tool 

Unlike  throw-away  prototypes,  the 
process  supported  by  APT  provides 
requirements  and  designs  in  a  form  that 
can  be  used  in  construction  of  the 
operational  system.  The  prototype 
provides  an  executable  representation  of 
system  requirements  that  can  be  used  for 
comparison  during  system  testing.  The 


existence  of  a  flexible  prototype  can 
significantly  ease  system  testing  and 
integration.  When  final  implementations 
of  subsystems  are  delivered,  integration 
and  testing  can  begin  before  all  of  the 
subsystems  are  complete  by  combining 
the  final  versions  of  the  completed 
subsystems  with  prototype  versions  of 
the  parts  that  are  still  being  developed. 

2  A  APT  as  an  Acquisition  Tool 

Decisions  about  awarding  contracts  for 
building  hard  real-time  systems  are  risky 
because  there  is  little  objective  basis  for 
determining  whether  a  proposed  contract 
will  benefit  the  sponsor  at  the  time  when 
those  decisions  must  be  made.  It  is  also 
very  difficult  to  determine  whether  a 
delivered  system  meets  its  requirements. 
APT,  besides  being  a  useful  tool  to  the 
hard  real-time  system  developers,  is  also 
very  useful  to  the  customers.  Acquisition 
managers  can  use  APT  to  ensure  that 
acquisition  efforts  stay  on  track  and  that 
contractors  deliver  what  they  promise. 
APT  enables  validation  of  requirements 
via  prototyping  demonstration,  greatly 
reducing  the  risk  of  contracting  for  real¬ 
time  systems. 

2.5  A  Platform  Independent  User 
Interface 

The  current  APT  system  provides  two 
interfaces  for  users  to  invoke  different 
APT  tools  and  to  enter  the  prototype 
specification.  The  main  interface  (Figure 
3)  was  developed  using  the  TAE+ 
Workbench  [11],  The  Ada  source  code 
generated  automatically  from  the  graphic 
layout  uses  libraries  that  only  work  on 
SUNOS  4.1.X  operating  systems.  The 
PSDL  editor  (Figure  4),  which  allows 
users  to  specify  the  prototype  via 
augmented  dataflow  diagram,  was 


implemented  in  C++  and  can  only  be 
executed  under  SUNOS  4.1.X 
environments.  A  portable 
implementation  of  the  APT  main 
interface  and  the  PSDL  editor  was 
needed  to  allow  users  to  use  APT  to 


build  PSDL  prototypes  on  different 
platforms.  We  choose  to  overcome  these 
limitations  by  reimplementing  the  main 
interface  (Figure  5)  and  the  PSDL  editor 
(Figure  6)  using  the  Java  programming 
language  [2], 
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Figure  3.  Main  Interface  of  APT  Release  2.0 
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Figure  5.  Main  Interface  of  the  new  APT 


The  new  graphical  user  interface,  called 
the  Heterogeneous  Systems  Integrator 
(HSI),  is  similar  to  the  previous  APT. 
Users  of  previous  APT  versions  will 
easily  adapt  to  the  new  interface.  There 
are  some  new  features  in  this 
implementation,  which  do  not  affect  the 
functionality  of  the  program,  but  provide 
a  friendlier  interface  and  easier  use.  The 
major  improvement  is  the  addition  of  the 
tree  panel  on  the  left  side  of  the  editor. 
The  tree  panel  provides  a  better  view  of 
the  overall  prototype  structure  since  all 


of  the  PSDL  components  can  be  seen  in 
a  hierarchy.  The  user  can  navigate 
through  the  prototype  by  clicking  on  the 
names  of  the  components  on  the  tree 
panel.  Thus,  it  is  possible  to  jump  to  any 
level  in  the  hierarchy,  which  was  not 
possible  earlier. 

3  A  SIMPLE  EXAMPLE: 
PROTOTYPING  A  C3I 
WORKSTATION 
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To  create  a  first  version  of  a  new 
prototype,  users  can  select  “New”  from 
the  “Prototype”  pull-down  menu  of  the 
APT  main  interface  (Figure  7).  The  user 
will  then  be  asked  to  provide  the  name 


of  the  new  prototype  (say  “c3i_system”) 
and  the  APT  PSDL  editor  will  be 
automatically  invoked  with  a  single 
initial  root  operator  (with  a  name  same 
as  that  of  the  prototype). 


APT  allows  the  user  to  specify  the 
requirements  of  prototypes  as  augmented 
dataflow  graphs.  Using  the  drawing  tools 
provided  by  the  PSDL  editor,  the  user 
can  create  the  top-level  dataflow 
diagram  of  the  c3i_systcm  prototype  as 
shown  in  Figure  8,  where  the  c3i_systcm 
prototype  is  modeled  by  nine  modules, 
communicating  with  each  other  via  data 
streams.  To  model  the  dynamic  behavior 
of  these  modules,  the  dataflow  diagram 
is  augmented  with  control  and  timing 
constraints.  For  example,  the  user  may 
want  to  specify  that  the 
weapons_interface  module  has  a 
maximum  response  time  of  3  seconds  to 
handle  the  event  triggered  by  the  arrival 
of  new  data  in  the  weapon_status_data 
stream,  and  it  only  writes  output  to  the 
weapon_emrep  stream  if  the  status  of  the 
weapon_status_data  is  damage, 

service_required,  or  out_of_ammunition. 
APT  allow  the  user  to  specify  these 
timing  and  control  constraints  using  the 
pop-up  operator  property  menu  (Figure 


9),  resulting  in  a  top-level  PSDL 
program  shown  in  Figure  1 0. 

To  complete  the  specification  of  the 
c3i_system  prototype,  the  user  must 
specify  how  each  module  will  be 
implemented  by  choosing  the 
implementation  language  for  the  module 
via  the  operator  property  menu.  The 
implementation  of  a  module  can  be  in 
either  the  target  programming  language 
or  PSDL.  A  module  with  an 
implementation  in  the  target 

programming  language  is  called  an 
atomic  operator.  A  module  that  is 
decomposed  into  a  PSDL 

implementation  is  called  a  composite 
operator.  Module  decomposition  can  be 
done  by  selecting  the  corresponding 
operator  in  the  tree-panel  on  the  left  side 
of  the  PSDL  editor. 

APT  supports  an  incremental 
prototyping  process.  The  user  may- 
choose  to  implement  all  nine  modules  as 
atomic  operators  (using  dummy 


components)  in  the  first  version,  so  as  to  detailed  subsystems  and  implement  the 

check  out  the  global  effects  of  the  timing  sub-modules  with  reusable  components, 

and  control  constraints.  Then,  he/she  while  leaving  the  others  as  atomic 

may  choose  to  decompose  the  operators  in  the  second  version  of  the 

comms_interface  module  into  more  prototype,  and  so  on. 
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OPERATOR  c3i_system 
SPECIFICATION 
DESCRIPTION 

{This  module  implements  a  simplified  version  of 
a  generic  C3I  workstation.} 

END 

IMPLEMENTATION 

GRAPH 


DATA  STREAM 

--  Type  declarations  for  the 
CONTROL  CONTRAINTS 

OPERATOR  comms_links 
PERIOD  30000  MS 

OPERATOR  navigation_system 
PERIOD  30000  MS 

OPERATOR  sensors 
PERIOD  30000  MS 

OPERATOR  weapons_sys terns 
PERIOD  30000  MS 


END 


Figure  10.  Top-level  Specification  of  the  c3i_system 


data  streams  in  the  graph  go  here. 

OPERATOR  weapons_interf ace 
TRIGGERED  BY  SOME 

weapon_status_data 
MINIMUM  CALLING  PERIOD  2000  MS 
MAXIMUM  RESPONSE  TIME  3000  MS 
OUTPUT 

weapons_emrep 

IF  weapon_status_data .status  = 
damaged 

OR  weapon_status_data . status  = 
service_required 
OR  weapon_status_data. status  = 
out  of  ammunition 


To  facilitate  the  testing  of  the 
prototypes,  APT  provides  the  user  with 
an  execution  support  system  that 
consists  of  a  translator,  a  scheduler  and  a 
compiler.  Once  the  user  finishes 
specifying  the  prototype,  he/she  can 
invoke  the  translator  and  the  scheduler 
from  the  APT  main  interface  to  analyze 
the  timing  constraints  for  feasibility  and 
to  generate  a  supervisor  module  for  each 
subsystem  of  the  prototype  in  the  target 
programming  language.  Each  supervisor 
module  consists  of  a  set  of  driver 
procedures  that  realize  all  the  control 
constraints,  a  high  priority  task  (the 
static  schedule)  that  executes  the  time- 
critical  operators  in  a  timely  fashion,  and 
a  low  priority  dynamic  schedule  task  that 
executes  the  non-time-critical  operators 
when  there  is  time  available.  The 
supervisor  module  also  contains 
information  that  enables  the  compiler  to 
incorporate  all  the  software  components 
required  to  implement  the  atomic 
operators  and  generate  the  binary  code 
automatically.  The  translator/schedulcr 
also  generates  the  glue  code  needed  for 
timely  delivery  of  information  between 
subsystems  across  the  target  network. 

For  prototypes  which  require 
sophisticated  graphic  user  interfaces,  the 
APT  main  interface  provides  an 
interface  editor  to  interactively  sculpt  the 
interface.  In  the  c3i_system  prototype, 
we  choose  to  decompose  the 
comms_interface,  the 

track_database_manager  and  the 
user_interface  modules  into  subsystems, 
resulting  in  hierarchical  design 
consisting  of  8  composite  operators  and 
twenty-six  atomic  operators.  The  user 
interface  of  the  prototype  has  a  total  of 
14  panels,  four  of  which  are  shown  in 
Figure  1 1 .  The  corresponding  Ada 


program  has  a  total  of  10.5K  lines  of 
source  code.  Among  the  10.5K  lines  of 
code,  3.5K  lines  comes  from  supervisor 
module  that  was  generated  automatically 
by  the  translator/scheduler  and  1.7K 
lines  that  were  automatically  generated 
by  the  interface  editor  [9]. 

4  CONCLUSION 

APT  has  been  used  successfully  as  a 
research  tool  in  prototyping  large  war¬ 
fighter  control  systems  (e.g.  the 
command-and-control  station,  cruise 
missile  flight  control  system,  missile 
defense  systems)  and  demonstrated  its 
capability  to  support  the  development  of 
large  complex  embedded  software. 
Specific  payoffs  include: 

(1)  Formulate/validate  requirements 
via  prototype  demonstration  and 
user  feedback 

(2)  Assess  feasibility  of  real-time 
system  designs 

(3)  Enable  early  testing  and 
integration  of  completed 
subsystems 

(4)  Support  evolutionary  system 
development,  integration  and 
testing 

(5)  Reduce  maintenance  costs 
through  systematic  code 
generation 

(6)  Produce  high  quality,  reliable 
and  flexible  software 

(7)  Avoid  schedule  overruns 

In  order  to  evaluate  the  benefits  derived 
from  the  practice  of  computer-aided 
prototyping  within  the  software 
acquisition  process,  we  conducted  a  case 
study  in  which  we  compared  the  cost  (in 
dollar  amounts)  required  to  perform 
requirements  analysis  and  feasibility 
study  for  the  c3i  system  using  the  2 1 67A 
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process,  in  which  the  software  is  coded 
manually,  and  the  rapid  prototyping 
process,  where  part  of  the  code  is 
automatically  generated  via  APT  [3].  We 
found  that,  even  under  very  conservative 
assumptions,  using  the  APT  method 
resulted  in  a  cost  reduction  of  S56,300,  a 
27%  cost  saving.  Taking  the  results  of 
this  comparison,  then  projecting  to  a 


mission  control  software  system,  the 
command  and  control  segment  (CCS), 
we  estimated  that  there  would  be  a  cost 
saving  of  12  million  dollars.  Applying 
this  concept  to  an  engineering  change  to 
a  typical  component  of  the  CCS  software 
showed  a  further  cost  savings  of 
S25,000. 


Figure  1 1.  User  Interface  of  the  c3i_system 
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Abstract:  This  paper  introduces  a  graph-oriented  model  for  conceptual  level  design  of  large,  complex 
information  systems.  This  has  been  shown  to  be  highly  effective  to  the  system  designer  from  the 
perspectives  of  maintainability  and  upgradability.  Basically  due  to  the  flat  structure  and  the  lack  of 
holding  multidimensional  data,  the  relational  model  does  not  provide  a  structural  approach  to  the 
system  designer.  The  other  alternative  object  oriented  model  offer  the  structured  approach  but  also  not 
able  to  describe  various  intermodular  relationships  spread  over  same  or  different  levels  within  a  data 
model.  The  graph  data  model  also  allows  dynamic  regrouping  of  related  entities  at  the  designers’  level. 
We  have  proposed  the  appropriate  data  structure  and  the  corresponding  DDL  has  also  been  developed. 
Test  runs  on  simulated  environment  further  establishes  its  computational  efficiency. 
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Complex  Information  System 


Abstract:  This  paper  introduces  a  graph-oriented  model  for  conceptual  level  design  of  large,  complex 
information  systems.  This  has  been  shown  to  be  highly  effective  to  the  system  designer  from  the 
perspectives  of  maintainability  and  upgradability.  Basically  due  to  the  flat  structure  and  the  lack  of 
holding  multidimensional  data,  the  relational  model  does  not  provide  a  structural  approach  to  the 
system  designer.  The  other  alternative  object  oriented  model  offer  the  structured  approach  but  also  not 
able  to  describe  various  intermodular  relationships  spread  over  same  or  different  levels  within  a  data 
model.  The  graph  data  model  also  allows  dynamic  regrouping  of  related  entities  at  the  designers’  level. 
We  have  proposed  the  appropriate  data  structure  and  the  corresponding  DDL  has  also  been  developed. 
Test  runs  on  simulated  environment  further  establish  its  computational  efficiency. 


Keywords:  Graph  oriented  data  model.  Semantic  view.  Functional  abstraction.  Encapsulation  of  data 
and  relationships. 


1.  Introduction 

The  environments  in  which  database  management  systems  are  being  used  have  changed  rapidly  in 
the  last  several  years.  Although  the  relational  model  has  made  prominent  contribution  in  the  research  of 
DBMS,  recent  database  applications  are  outgrowing  this  model.  The  table  based  relational  model  is  not 
the  best  approach  to  express  complex  and  diverse  databases.  In  this  model,  relationships  among  records 
are  not  structurally  specified  and  due  to  this  flat  structure  of  the  relational  model,  this  is  not  useful  to  a 
user  attempting  to  comprehend  the  logical  structure  actually  existing  in  a  schema.  The  alternative  idea 
provide  the  concept  of  a  class  which  can  encapsulate  homogeneous  objects  but  there  are  no  direct  means 
to  describe  the  mutual  relationships  amongst  the  objects  within  a  class  or  to  express  the  intermodular 
relationships  spread  over  same  or  different  levels.  So  our  goal  is  to  design  a  data  model  providing  a 
structural  approach  which  retains  the  desirable  properties  of  the  relational  and  object  oriented  model  and 
simultaneously  overcome  the  bottleneck  of  these  schemes  through  the  incorporation  of  some  new 
features.  In  this  effort,  a  graph  based  data  model  at  conceptual  level  having  the  concept  of  functional 
abstraction  has  been  developed.  The  significant  improvement  is  expected  corresponding  tograph  model 
in  the  context  of  maintainability,  adaptability  and  transparency  from  the  view  of  a  system  designer. 

Here  we  discuss  related  work  done  in  the  areas  of  graph-based  data  models,  object  oriented 
approach  in  graph  data  models,  semi-structured  data  and  view  update.  Abiteboul  in  [2]  uses  semi- 
structured  data  that  is  neither  raw  data  (file  systems)  nor  strictly  typed  (table-oriented  or  object 
oriented).  Even  if  semi-structured  data  may  have  a  structure,  this  structure  is  often  implicit,  and  not  as 
rigid  or  regular  as  that  found  in  standard  database  systems.  In  [3]  an  OQL  like  query  language  extended 
with  information  retrieval  tools  is  proposed  to  query  SGML  and  HTML  documents.  Buneman  in  [5] 
defines  semi-structured  data  as  that  for  which  the  information  normally  associated  with  a  schema  is 
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contained  within  the  data  itself.  An  attempt  has  been  made  to  represent  semi-structured  data  as  graph 
ike  or  tree  like  structure,  where  edges  are  labeled  representing  data  types  and  leaves  stand  for  raw  data. 
In  [6]  a  query  language  (UnQL)  is  adapted,  which  solves  some  of  the  limitations  of  SQL  like  lan<ma*es 
for  semi -structured  data.  Related  work  is  also  being  conducted  in  both  the  area  of  semi-structured  data 
access  and  querying.  The  graph  data  model  presented  here  uses  similar  structure  as  in  the  ER  data 
model  [7],  representing  atomic  entities  by  nodes  and  relations  among  them  by  links.  However  the 
database  schema  represented  by  the  ER  data  model  is  not  accessible  by  the  DBMS,  whereas  in  the  <rraph 
data  model  the  structure  of  the  database  is  represented  as  part  of  the  graph  database  itself.  In  [8]  a  oraph 
model  is  proposed  as  underlying  unified  data  model  to  access  different  databases  expressed  in  standard 
data  models.  The  query  language  is  formally  defined  in  terms  of  graphical  primitives  (atomic  queries).  A 
global  information  management  system  was  developed  providing  a  global  framework  where  data  on  the 
web  is  accessed  through  conceptual  views.  GOOD  [9,10,1 1]  started  as  a  database  interface,  then  evolved 
as  a  graph  object  oriented  database  system.  Actually,  it  is  a  graph  representation  of  an  object  oriented 
database,  where  nodes  represent  objects  and  links  represent  relationships  between  objects.  The  GRAS 
ata  model  [12]  relies  on  attributed  graphs.  In  this  model,  objects  are  represented  by  typed  nodes,  which 
may  carry  attributes.  Relations  between  objects  are  modeled  by  bi-directional  edges.  In  our  model,  we 
are  trying  to  focus  more  on  the  concept  of  semantic  groups  providing  the  concept  of  functional 
abstraction  for  querying  and  updating  data  model  but  all  the  previous  works  are  focused  on  defining  a 
new  approach  to  represent  the  graph  data  model  itself. 

Our  goal  is  to  provide  a  tool  to  the  system  designer  level  for  describing  and  maintaining  a  complex 
semi-structured  information  system  in  a  better  way.  So  we  have  proposed  a  methodology  to  develop  a 
directed  graph  model  in  the  logical  level  as  (V,E)  where  a  node  V  represents  a  basic  data  object  or  a 
functionally  abstracted  module  and  an  edge  implies  the  binary  relationships  between  the  entities  present 
in  the  graph  data  model.  In  the  graph  model  we  can  encapsulate  the  nodes  of  lower  level  under  a 
functional  abstraction  node  from  a  specific  semantic  view.  There  is  no  restriction  on  the  existence  of 
relationships  among  the  nodes  in  a  graph.  Based  on  the  graph  based  data  model  framework,  we  have 
developed  a  data  description  language  (DDL)  for  easy  description  and  modification  of  the  entity  and 
their  relationships  within  a  complex  information  system.  A  quite  user-friendly  script  is  provided  to  the 
system  designer  for  easy  description  of  the  conceptual  level.  In  DDL,  we  have  generated  a  friendly 
script  for  the  system  designer;  a  mathematical  script  has  been  generated  also  for  each  statement  (e.°. 
relation,  encapsulation)  of  the  designer  script  and  according  to  the  operation  described  in  the 
mathematical  script  the  software  will  be  executed  generating  the  data  structure  as  a  output.  These  entire 

concepts  have  been  crystallized  in  the  form  of  a  software  tool,  which  has  also  been  subsequently 
implemented.  J 


2.  The  proposed  data  model  and  corresponding  data  structure 

The  conceptual  level  of  a  semi-structured  information  system  is  represented  by  the  graph  model 
depicted  m  fig.l.  Here  the  basic  instances  of  entity  (lowest  level  vertices)  or  the  functionally  abstracted 
module  is  indicated  by  the  vertex  and  the  relationship  among  them  by  directed  edges.  In  this  graph  data 
mo  el  the  vertices  indicated  by  triangle,  square  and  circle  indicate  the  node  in  the  lowest  level, 
intermediate  level  and  the  top  most  level  respectively.  The  concept  of  encapsulation  is  implemented 
wit  in  t  e  graph  with  respect  to  a  functional  abstraction  node  from  a  specific  scientific  view  e.g.  the 
nodes  4,5,6, 7  and  8  are  encapsulated  in  a  same  class  under  the  functional  abstraction  node  2  reflecting  a 
spedfic  semantic  view.  The  parallel  edge  between  the  nodes  4  and  5  indicate  the  existence  of  two 
different  relation  declared  from  two  different  semantic,  declared  with  respect  to  abstraction  node  1  and 
2.  It  has  been  suggested  a  suitable  data  structure  to  declare  the  conceptual  level  of  the  data  model 
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depicted  in  fig.  1 .  A  pointer  array  maintains  the  growth  of  this  graph  -  with  each  element  of  the  array 
representing  a  vertex  in  the  graph.  Each  element  of  array  points  to  a  doubly  linked  list,  right  link 
maintains  the  set  of  vertices  encapsulated  by  it  and  the  left  link  points  to  the  set  of  vertices  within  each 
of  which  it  exists  as  a  vertex  of  the  encapsulated  subgraph. 


(Figure  1  :  The  Data  Model) 


Each  element  of  the  linked  list  is  a  structure  of  three  elements  -the  vertex  no.,  type  of  relation  (i.e. 
encapsulation  is  represented  by  tag  L  and  direct  edge  is  denoted  by  tag  E)  and  the  functional  abstraction 
node  on  which  the  relation  is  based  on.  The  assumption  has  been  made  for  creation  of  the  data  structure 
that  the  highest  level  vertices  1,2,3  are  encapsulated  within  the  vertex  0,  which  is  treated  as  the  top  most 
level  node.  This  is  indicated  by  a  high  value  in  the  left  link  of  the  vertex  indicated  by  0.  The  right  link 
of  node  1  indicates  that  an  encapsulation  class  has  been  formed  with  the  member  3,4  and  5  under  the 
functional  abstraction  node  1  and  there  is  a  direct  edge  from  1  to  2  defined  with  respect  to  abstraction 
node  0.  Similarly  the  left  link  indicates  that  the  functional  abstraction  node  1  itself  is  encapsulated  as  a 
member  under  the  node  0  and  there  is  a  direct  edge  from  9  to  the  node  l.Also  from  the  linked  lists 
corresponding  the  node  4,  we  can  say  that  the  node  4  is  encapsulated  as  a  member  in  the  two 
encapsulation  class  generated  under  the  node  1  as  well  as  2  and  there  is  a  parallel  edge  between  4  and  5 
defined  with  respect  to  higher  level  node  1  and  2  respectively. 
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5 
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7 
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8 
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14/E 
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0/E 

9 
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1  I/E 

12/E 
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2/L/0 

9/E 

10 
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12/L/9 

9/E 

11 

15/E 

16/E 
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1  l/L/9 
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9/E 

12 
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8/E 

13 

8/E 

14 

5/L72 

11/E 

15 

1 1/E 

16 

(Table  1 :  Data  structure  for  figure  1  Data  model) 


3.  Data  Description  Language 

A  data  description  language  is  also  introduced  in  this  paper  by  which  the  system  designer  will  be 
able  to  easily  describe  the  graph  based  data  model  and  also  can  modify  according  to  the  need  of  the 
application.  A  user-friendly  script  is  provided  to  designer  for  easy  description  of  the  data  model.  The 
equivalent  mathematical  script  (relations)  is  being  generated  and  executed  through  the  procedures 
provided  in  the  software  producing  the  data  structure  i.e.,  the  data  model  as  an  output. 

3.1.  Creation  of  the  graph  model 

The  DDL  provided  here  have  a  two-fold  job;  one  is  to  create  the  graph  model  depending  upon  the 
information  available  to  system  designer  initially  and  modify  it  according  to  the  requirement  of  the 

application.  In  this  section  we  have  described  that  how  the  data  model  depicted  in  fig.l  will  be  created 
as  per  this  DDL. 


We  consider  that  there  are  three  basic  operations  from  the  data  description  point  of  view. 

a.  Creation  of  nodes,  as  an  element  of  an  array. 

b.  Encapsulation  of  nodes  belonging  to  a  semantic  class  with  respect  to  a  higher  level 
functional  abstraction  node. 

c.  Declaration  of  direct  relationship  amongst  nodes  within  the  graph. 

The  syntax  of  the  user-friendly  script  for  the  operations  referred  above  as  a  to  c  is  given  below. 

CREATE  GRAPH  [GRAPHNAME]  [NO  OF  NODES]; 

ENCAP  [CLASSNAME]  [MEMBER  OF  CLASS]  UNDER  [FUNCTIONAL  ABSTRACTION  NODE]; 
CREATE  REL  [RELATION  NAME]  WITH  [CLASSNAME]  FOR  [NODES  INVOLVED  IN 
RELATION] ,  So  to  declare  the  data  model  of  fig.l  the  designer  have  to  declare  the  data  and  their 
relations  according  to  the  syntax  already  given. 
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CREATE  GRAPH  G1  17;  *  Initially  we  want  to  create  a  graph  model  named  G1  of  17  nodes. 

ENCAP  EO  [1,2,9]  UNDER  0;  *  The  nodes  1,2  and  9  will  be  encapsulated  under  the  node  0  and  it  will 
be  identified  by  relation  EO. 

CREATE  REL  R1  WITH  EO  FOR  [1,2]; 

CREATE  REL  R2  WITH  EO  FOR  [9,1]; 

CREATE  REL  R3  WITH  EO  FOR  [9,2];  *  This  statement  indicates  that  the  relation  named  R1  reflects 
an  edge  between  1  and  2  defined  with  respect  to  the  functional  abstraction  node  mentioned  in  relation 
EO  i.e.  0.  Instead  of  the  above  three  statements,  we  can  write  CREATE  REL  R1,R2,R3  WITH  EO  FOR 
[1,2], [9,1], [9, 2], 

ENCAP  El  [3,4,5]  UNDER  1; 

CREATE  REL  R4  WITH  El  FOR  [4,5]; 

ENCAP  E2  [4,5,6,7,8]  UNDER  2; 

CREATE  REL  R5  WITH  E2  FOR  [4,5]; 

CREATE  REL  R6  WITH  E2  FOR  [7,8]; 

In  the  above  manner  we  have  to  express  all  the  relations  amongst  nodes  within  the  graph.  Then  the 
equivalent  internal  form  will  be  generated  after  compilation  and  the  corresponding  script  is  described 
here. 

v|/  G1  [0-16];  *  The  symbols  v|/  and  <|>  are  used  for  creation  and  encapsulation  respectively. 

E0  =  <[>[  1,2,9]°; 

R1  =  [1,2]/  EO;  *  Direct  relation  between  1  &  2  is  defined  with  respect  to  the  functional  abstraction 

node  present  in  relation  EO. 

R2=  [9,1]/ EO; 

R3  =  [9,2]/ EO; 

El  =  <|>  [3,4,5]’; 

R4=  [4,5]/El; 

E2  =  $  [4,5,6, 7, 8]2; 

R5  =  [4,5]/E2; 

R6  =  [7,8]/E2; 

Now  these  mathematical  expressions,  as  declared  by  the  designer,  are  treated  as  an  input  of  the 
software  (also  provided  in  DDL)  and  the  corresponding  data  structure  is  generated  as  output. 

The  complexity  of  the  algorithm  for  development  of  data  structure  from  a  graph  of  n  vertices,  as 
specified  by  the  system  designer,  is  0(n2).  This  has  been  tested  in  a  simulated  environment.  A  random 
graph  with  degree  varying  from  4  to  a  maximum  number  of  10  has  been  considered  as  input  to  the 
algorithm  and  the  corresponding  data  structure  has  been  generated.  The  execution  time  has  been  plotted 
against  the  number  of  vertices  as  shown  below. 


Number 
of  nodes 

Execution  time  (in  secs) 

Degree  4 

Degree  7 

Degree  10 

200 

0.054989 

300 

0.054989 

0.054989 

400 

0.054989 

0.10989 

500 

0.054989 

0.10989 

0.10989 

600 

0.10989 

0.164835 

0.21978 

[  700 

0.10989 

0.274725 

0.384615 

800 

0.164835 

0.32967 

0.549451 

900 

0.21978 

0.43956 

0.659341 

169 


♦  D4 
■  D7 
DIO 

—  Poly.  (DIO) 

—  Poly.  (D7) 

—  Poly.  (D4) 


3.2.  Modification  of  the  graph  model 

In  context  to  the  modification  of  a  graph  model  the  basic  operations  are: 

a.  Insertion  of  node(s)  in  the  graph  model. 

b.  Deletion  of  an  existing  node. 

c.  Modification  of  an  encapsulation  class  from  a  different  semantic  view. 

d.  Modification  of  an  already  existing  edge. 

The  syntax  of  the  user-friendly  script  of  the  operations  referred  here  as  a  to  d  is  given  below. 

OPEN  GRAPH  [GRAPH  MODEL  NAME]:  *  Initially  the  designer  have  to  open  the  graph  model  G1 
for  modification. 

INSERT  NODE  [NO.  OF  NODES]; 

DELETE  NODE  [NODES]; 

Generally  the  designer  can  delete  only  the  lowest  level  nodes.  The  nodes  selected  for  deletion 
including  the  relations  involving  these  nodes  will  be  deleted  as  a  result  of  this  operation.  But  if  this 
attempt  has  been  made  for  any  functional  abstraction  node,  all  the  nodes  encapsulated  within  it  will  also 
be  deleted  including  the  specified  node  after  getting  an  assurance  for  this  operation  from  the  system 
designer. 

The  operation  referred  above  as  c  can  be  implemented  by  insertion  of  a  node  into  an  encapsulation 
class  from  another  class  or  by  substitution  of  a  group  of  nodes  by  the  members  of  a  different  class.  In 
this  case,  all  the  relations  involving  the  nodes  take  place  in  the  operation  are  deleted.  Again  the  designer 
have  to  define  the  new  relationship  from  a  different  semantic  aspect. 

INSERT  NODE(S)  [NODES]  OF  CLASS  [CLASSNAME]  WITHIN  CLASS  [CLASSNAME]; 
MODIFY  NODE(S)  [NODES]  OF  CLASS  [CLASSNAME]  BY  [NODES]  OF  CLASS 
[CLASSNAME]; 

There  are  two  cases  regarding  the  operation  d,  either  we  want  to  delete  an  existing  relation  or  to 
insert  a  new  relation  between  a  pair  of  node  with  respect  to  an  abstraction  node. 

DELETE  REL  [NODE l,NODE2, ABSTRACTION  CLASS]; 
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INSERT  REL  [NODE  1,N0DE2, ABSTRACTION  CLASS]; 

The  specified  graph  model  must  be  closed  after  the  completion  of  the  modification. 

CLOSE  GRAPH  [GRAPH  MODEL  NAME] . 

Let  us  describe  the  modification  of  the  graph  model  with  an  example.  We  assume  the  system 
designer  want  to  perform  the  following  modifications  on  the  graph  model  of  fig.l. 

1 .  Insert  two  new  nodes  in  the  graph  model. 

2.  Enter  these  two  nodes  in  the  encapsulation  class  headed  by  functional  abstraction  node  2. 

3.  Replace  the  node  3  of  encapsulation  class  headed  by  node  1  by  the  node  7  of  the  class  under 
node  2. 

4.  Create  an  edge  from  node  7  to  5  with  respect  to  node  1 . 

5.  Delete  the  edge  between  4  and  5  defined  with  respect  to  node  2. 

To  perform  the  modifications  mentioned  above,  the  designer  script  will  be: 

OPEN  GRAPH  Gl; 

INSERT  NODES  [2]; 

INSERT  NODES  [17,18]  OF  CLASS  EO  WITHIN  CLASS  E2; 

MODIFY  NODE  [3]  OF  CLASS  El  BY  [7]  OF  CLASS  E2; 

INSERT  REL  [7,5,E1]; 

DELETE  REL  [7,8,E2]; 

CLOSE  GRAPH  Gl. 

The  equivalent  mathematical  script  involving  the  new  modified  relations  will  be: 

OPEN  Gl; 

\\f  [17-18];  *  Create  two  new  nodes  in  the  graph  model  Gl. 

EO  =  <j>[l,2,9,17,18]°;  *  By  default ,  these  two  nodes  are  encapsulated  within  EO. 

E2  =  <|)[4,5,6,7,8,17,18]2;  *The  modified  class  relation  E2. 

EO  =  <j>[  1,2,9]°; 

El  =  <(>[4,5,7]  * ; 

E2  =  <{>[3,4,5,6,17,18] 2; 

DEL  REL  R6;  *  Delete  the  previous  relation  involving  node  7,named  R6,  as  the  previous  relation  may 
not  exist  from  the  new  semantic  view.  If  required,  re-describe  the  relation. 

R6  =  [7,5]/El;  *  A  new  relation  named  R6  is  generated. 

DEL  REL  R5;  *  The  relation  between  7  &  8  with  respect  to  node  2  (named  R5)  will  be  deleted. 

CLOSE  Gl. 

The  mathematical  script  written  above  for  the  modification  is  also  executed  through  the  software 
and  the  data  structure  of  the  conceptual  data  model  will  be  modified  accordingly. 

4.  Graph  Data  Model  in  Distributed  Computing  Environment 

In  recent  years,  almost  all  of  the  software  should  be  compatible  to  distributed  environment  due  to  the 
increasing  trend  towards  the  distribution  of  computer  systems  over  multiple  sites  that  are  interconnected 
via  a  communication  network.  So  the  distributed  database  concept  with  respect  to  our  data  model 
implies  that  the  graph  designed  by  system  designer  must  be  spread  over  the  sites  of  a  computer  network. 
The  fragmentation  amount  of  the  graph  totally  depends  on  the  nature  of  specific  application  e.g.  What 
type  of  queries  will  be  processed  at  a  specific  site;  What  are  the  necessary  information  related  to  these 
queries,  etc.  Still  in  this  section  we  propose  some  general  fragmentation  methodologies  of  the  graph 
model  to  achieve  the  improved  performance.  Depending  upon  the  nature  of  the  queries  and  the  related 
necessary  information  to  process  these  queries,  the  relevant  portion  of  the  graph,  i.e.  the  semantic  groups 
must  be  distributed  amongst  the  sites.  We  termed  this  method  as  Graph  fragmentation.  It  may  be  the 
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case  that  one  can  keep  the  necessary  information  (occurrences)  within  a  group  instead  of  storing  all 
members  of  that  group.  This  is  decided  dynamically  through  generation  of  certain  constraints.  In  that 
case  the  entire  copy  of  the  semantic  group  must  be  kept  in  another  site  to  avoid  the  loss  of  information. 

In  some  cases,  we  have  to  keep  more  than  one  copy  of  the  same  semantic  group  in  different  sites. 
In  spite  of  the  chances  of  generating  inconsistency  during  updation,  the  replication  is  to  be  allowed  to 
increase  the  availability  and  to  reduce  the  communication  cost  for  accessing  data  from  different  sites.  In 
our  data  model,  described  in  section  2,  we  have  allowed  to  declare  the  relationship  between  the  meber  of 
different  groups  spread  over  different  levels.  After  fragmentation,  the  mutual  relationship  within  the 
group  must  belong  to  the  subgraph  present  in  the  local  site.  A  table,  named  Link  table,  has  been 
maintained  to  keep  track  of  information  regarding  relationship  of  any  node  in  the  local  site  with  another 
node  in  some  other  site.  This  has  been  illustrated  with  an  example  in  the  next  paragraph. 

Suppose  there  are  three  sites  SI,  S2  and  S3  with  respect  to  abstract  data  model  as  depicted  in  fig. 

1  and  analysing  the  query  to  be  processed,  the  designer  has  taken  the  decision  about  distribution  in  the 
following  manner. 

Semantic  groups  encapsulated  under  functional  abstraction  node  1  and  2. 

Semantic  groups  encapsulated  under  functional  abstraction  node  2  and  8. 

Semantic  groups  encapsulated  under  functional  abstraction  node  9  and  1 1 . 

According  to  the  fragmentation  scheme  described  above,  the  data  structure  depicted  in  Table  1  will 
also  be  decomposed  amongst  the  sites.  The  figure  2  implies  the  subgraph  belongs  to  site  1  and  the  Table 

2  indicates  the  corresponding  data  structure.  Here  the  dotted  lines  and  vertices  indicate  that  these  are  not 
belonging  within  the  site  1  but  for  query  processing  these  vertices  or  links  may  be  required. 


Site  2 
Site  3 


(Figure  2  :  Fragmented  Graph  model  for  site  1) 
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(Table  2:  Data  structure  for  Site  1) 

Now  if  the  queries  to  be  processed  in  site  1  are  generally  based  on  retrieval,  then  the  nodes  13, 
15  and  9  should  be  replicated  into  site  1.  Otherwise,  a  link  table  for  site  1  is  maintained  using  which 
information  can  be  fetched  from  different  sites  to  process  the  queries. 


Site  Number 

Vertex  1 

Vertex  2 

2 

6 

13 

3 

_  6  . 

15 

3 

9 

1 

3 

_ 9  :  ; 

2 

(Table  3  :  Link  Table  for  site  1) 

The  task  for  creating  site  1  can  be  accomplished  by  the  following  set  of  commands  provided  in 
the  proposed  data  model. 

Open  graph  Gl; 

Fragment  group  1,2  into  sitel; 

In  a  similar  way,  the  other  sites  can  be  created  and  be  joined  back  to  regenerate  the  data  model  in 
figure  1  by  the  graph  join  operation  provided  in  our  data  model. 


5.  Conclusion 

In  this  paper,  an  attempt  has  been  made  to  present  an  alternative  approach  for  storage  and 
maintenance  of  semi-structured  data  based  information  system.  A  better  performance  may  be  obtained 
from  our  data  model  due  to  the  point  mentioned  below. 

Maintainability:  The  relational  model  is  not  providing  a  structured  approach  of  the  entities  present  in  a 
large  information  system.  So  to  find  out  the  actual  relations  among  entities  scattered  through  different 
tables  or  the  relations  between  tables,  the  designer  has  to  derive  the  relations  via  common  attributes 
searching  through  the  tables.  In  a  complicated  large  system,  it  will  be  a  cumbersome  process.  But  due  to 
the  provision  of  the  structured  approach  in  our  proposed  model,  the  designer  can  easily  find  out  the 
relationships  between  some  attributes  (lowest  level  node)  or  some  functional  abstraction  node  directly 
via  the  option  provided  in  our  DDL. 

Adaptability:  It  will  be  an  ideal  condition  to  the  system  designer,  if  all  information  regarding  the 
application  is  clearly  known  at  the  right  of  the  beginning.  Unfortunately,  in  practical,  we  have  to  initiate 
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the  design  process  with  only  limited  knowledge  and  the  system  is  going  to  be  gradually  enriched  with 
the  inclusion  of  new  information.  So  the  data  model  should  provide  the  feature  of  easy  inclusion  of  new 
information  on  the  existing  data  model.  The  proposed  model  is  flexible  one  to  offer  the  designer  to 
easily  incorporate  new  information  as  node  in  the  graph  and  to  describe  the  relations  of  the  new  nodes 
with  the  existing  nodes  through  edges.  It  also  provides  the  facility  to  redefine  the  relationships  from  a 
different  semantic  view  and  accordingly  the  designer  can  also  maintain  the  different  semantic  view  of 
the  same  data  model  as  per  the  requirement.  So  this  model  is  a  really  adaptable  one  providing  more  than 
one  view  of  same  data  model  with  respect  to  separate  semantic  to  the  system  designer  level.  (This  view 
is  totally  different  from  the  view  provided  to  the  end  user  level) 

Context  sensitivity  of  the  relations:  The  concept  of  functional  abstraction  is  introduced  to  increase  the 
effectiveness  of  the  model.  The  designer  will  be  able  to  formulate  the  behavioral  aspects  of  the  entities 
by  forming  an  encapsulation  class  with  respect  to  a  functional  abstraction  node  and  can  declare  the 
mutual  relationships  among  the  members  of  the  class.  All  these  relations  are  context  sensitive  i.e. 
declared  from  a  specific  semantic  which  is  incorporated  within  functional  abstraction  node  e.g.  the 
parallel  edge  within  node  4  and  5  in  fig.  1  is  context  sensitive;  one  is  defined  with  respect  to  abstraction 
node  1  and  the  other  with  respect  node  2.  So  the  significant  improvement  has  been  expected  for  this 
graph  based  data  model  in  the  context  of  the  points  mentioned  in  this  section.  The  present  work  may 
further  be  consolidated  by  treating  a  Table  as  node  in  lieu  of  occurances  of  the  attributes  to  maintain  the 
user-friendlyness  at  the  end  user  level. 
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Measuring  and  Evaluating 
Maintenance  Process  Using 
Reliability,  Risk,  and  Test  Metrics 
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Abstract— In  analyzing  the  stability  of  a  maintenance  process,  it  is  important  that  it  not  be  treated  in  isolation  from  the  reliability  and 
ris*  of  deploying  the  software  that  result  from  applying  the  process.  Furthermore,  we  need  to  consider  the  efficiency  of  the  test  effort 
that  is  a  part  of  the  process  and  a  determinate  of  reliability  and  risk  of  deployment.  The  relationship  between  product  qualify  and 
process  capability  and  maturity  has  been  recognized  as  a  major  issue  in  software  engineering  based  on  the  premise  that 
improvements  in  process  wiii  lead  to  higher  quality  products.  To  this  end,  we  have  been  investigating  an  important  facet  of  process 
capability— stability— as  dtfned  and  evaluated  by  trend,  change,  and  shape  metrics,  across  releases  and  within  a  release.  Our 
integration  of  product  and  process  measurement  serves  the  dual  purpose  of  using  metrics  to  assess  and  predict  reliability  and  risk  and 
to  evaluate  process  stability.  We  use  the  NASA  Space  Shuttle  flight  software  to  illustrate  our  approach. 

Index  Terms  Maintenance  process  stability,  product  and  process  integration,  reliability  risk. 


l  Introduction 

EASURLVG  and  evaluating  the  stability  of  maintenance 
processes  is  important  because  of  the  recognized 
relationship  between  process  quality  and  product  quality 
[7].  We  focus  on  the  important  quality  factor  reliability.  A 
maintenance  process  can  quickly  become  unstable  because 
the  very  act  of  installing  software  changes  the  environment: 
pressures  operate  to  modify  the  environment,  the  problem, 
and  the  technological  solutions.  Changes  generated  by 
users  and  the  environment  and  the  consequent  need  for 
adapting  the  software  to  the  changes  is  unpredictable  and 
cannot  be  accommodated  without  iteration.  Programs  must 
be  adaptable  to  change  and  the  resultant  change  process 
must  be  planned  and  controlled.  According  to  Lehman, 
large  programs  are  never  completed,  they  just  continue  to 
evolve  [11].  In  other  words,  with  software,  we  are  dealing 
with  a  moving  target.  Maintenance  is  performed  continu¬ 
ously  and  the  stability  of  the  maintenance  process  has  an 
effect  on  product  reliability.  Therefore,  when  we  analyzed 
the  stability  of  the  NASA  Space  Shuttle  software  main¬ 
tenance  process,  it  was  important  to  consider  the  reliability 
of  the  software  that  the  process  produces.  Furthermore,  we 
needed  to  consider  the  efficiency  of  the  test  effort  that  is  a 
part  of  the  process  and  a  determinate  of  reliability. 
Therefore,  we  integrated  these  factors  into  a  unified  model, 
which  allowed  us  to  measure  the  influence  of  maintenance 
actions  and  test  effort  on  the  reliability  of  the  software.  Our 
hypothesis  was  that  these  metrics  would  exhibit  trends  and 
other  characteristics  over  time  that  would  be  indicative  of 
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the  stability  of  the  process.  Our  results  indicate  that  this  is 
the  case. 

We  conducted  research  on  the  NASA  Space  Shuttle  flight 
software  to  investigate  a  hypothesis  of  measuring  and 
evaluating  maintenance  stability.  We  used  several  metrics 
and  applied  them  across  releases  of  the  software  and  within 
releases.  The  trends  and  shapes  of  metric  functions  over 
time  provide  evidence  of  whether  the  software  maintenance 
process  is  stable.  We  view  stability  as  the  condition  of  a 
process  that  results  in  increasing  reliability,  decreasing  risk 
of  deployment,  and  increasing  test  effectiveness.  In  addi¬ 
tion,  our  focus  is  on  process  stability,  not  code  stability.  We 
explain  our  criteria  for  stability;  describe  metrics,  trends, 
and  shapes  for  judging  stability;  document  the  data  that 
was  collected;  and  show  how  to  apply  our  approach. 
Building  on  our  previous  work  of  defining  maintenance 
stability  criteria  and  developing  and  applying  trend  metrics 
for  stability’  evaluation  [15J,  in  this  paper  we  review  related 
research  projects,  introduce  shape  metrics  for  stability 
evaluation,  apply  our  change  metric  for  multiple  release 
stability  evaluation,  consider  the  functionality  of  the  soft¬ 
ware  product  in  stability  evaluation,  and  interpret  the 
metric  results  in  terms  of  process  improvements. 

Our  emphasis  in  this  paper  is  to  propose  a  unified  product 
and  process  measurement  model  for  product  evaluation  and 
process  stability”  analysis.  The  reader  should  focus  on  the 
model  principles  and  net  on  the  results  obtained  for  the 
Shuttle.  These  are  used  only  to  illustrate  the  model  concepts. 
In  general,  different  numerical  results  would  be  obtained  for 
other  applications  that  use  this  model. 

Section  2  reviews  related  research.  In  Section  3,  the 
concept  of  stability  is  explained  and  trend  and  shape 
metrics  are  defined.  Section  4  defines  the  data  and  the 
NASA  Space  Shuttle  application  environment.  Section  5 
gives  an  analysis  of  relationships  among  maintenance, 
reliability,  test  effort,  and  risk,  while  Section  6  discusses 
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both  long  term  (i.e„  across  releases)  and  short  term  (i.e., 
within  a  release),  as  applied  to  the  NASA  Space  Shuttle' 
Section  7  discusses  our  attempt  to  relate  product  metrics  to 
process  improvements  and  to  the  functionality  and  com¬ 
plexity  of  the  software.  Conclusions  are  drawn  in  Section  8. 

2  Related  Research  and  Projects 

A  number  of  useful  related  maintenance  measurement  and 
process  projects  have  been  reported  in  the  literature.  Briand 
et  al.  developed  a  process  to  characterize  software  main¬ 
tenance  projects  [3].  They  present  a  qualitative  and 
inductive  methodology  for  performing  objective  project 
characterizations  to  identify  maintenance  problems  and 
needs.  This  methodology  aids  in  determining  causal  links 
between  maintenance  problems  and  flaws  in  the  main¬ 
tenance  organization  and  process.  Although  the  authors' 
have  related  ineffective  maintenance  practices  to  organiza¬ 
tional  and  process  problems,  they  have  not  made  a  linkage 
to  product  reliability  and  process  stability. 

Gefen  and  Schneberger  developed  the  hypothesis  that 
maintenance  proceeds  in  three  distinct  serial  phases: 
corrective  modification,  similar  to  testing;  improvement  in 
function  within  the  original  specifications;  and  the  addition 
of  new  applications  that  go  beyond  the  original  specifica¬ 
tions  [5].  Their  results  from  a  single  large  information 
system,  which  they  studied  in  great  depth,  suggested  that 
software  maintenance  is  a  multiperiod  process.  In  the 
NASA  Space  Shuttle  maintenance  process,  in  contrast,  all 
three  types  of  maintenance  activities  are  performed  con¬ 
currently  and  are  accompanied  by  continuous  testing. 

Henry  et  al.  found  a  strong  correlation  between  errors 
corrected  per  module  and  the  impact  of  the  software 
upgrade  [6].  This  information  can  be  used  to  rank  modules 
by  their  upgrade  impact  during  code  inspection  in  order  to 
fmd  and  correct  these  errors  before  the  software  enters  the 
expensive  test  phase.  The  authors  treat  the  impact  of  change 
but  do  not  relate  this  impact  to  process  stability. 

Khoshgoftarr  et  al.  used  discriminant  analysis  in  each 
iteration  of  their  project  to  predict  fault  prone' modules  in 
the  next  iteration  [10].  This  approach  provided  an  advance 
indication  of  reliability  and  the  risk  of  implementing  the 
next  iteration.  This  study  deals  with  product  reliability  but 
does  not  address  the  issue  of  process  stability. 

Pearse  and  Oman  applied  a  maintenance  metrics  index 
to  measure  the  maintainability  of  C  source  code  before  and 
after  maintenance  activities  [13].  This  technique  allowed  the 
project  engineers  to  track  the  "health"  of  the  code  as  it  was 
being  maintained.  Maintainability  was  assessed  but  not  in 
terms  of  process  stability. 

Pigoski  and  Nelson  collected  and  analyzed  metrics  on  size, 
trouble  reports,  change  proposals,  staffing,  and  trouble  report 
and  change  proposal  completion  times  [14].  A  major  benefit 
of  this  project  v/as  the  use  of  trends  to  identify  the  relationship 
between  the  productivity  of  the  maintenance  organization 
and  staffing  levels.  Although  productivity  was  addressed, 
product  reliability  and  process  stability  were  not  considered! 

Sneed  reengineered  a  client  maintenance  process  to 
conform  to  the  ANSI/IEEE  Standard  1291,  Standard  for 
Software  Maintenance  [19].  This  project  is  a  good  example 
of  how  a  standard  can  provide  a  basic  framework  for  a 
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process  and  can  be  tailored  to  the  characteristics  of  the 
project  environment.  Although  applying  a  standard  is  an 
appropriate  element  of  a  good  process,  product  reliability 
and  process  stability  were  not  addressed. 

Stark  collected  and  analyzed  metrics  in  the  categories  of 
customer  satisfaction,  cost,  and  schedule  with  the  objective 
of  focusing  management's  attention  on  improvement  areas 
and  tracking  improvements  over  time  [20].  This  approach 
aided  management  in  deciding  whether  to  include  changes 
in  the  current  release,  with  possible  schedule  slippage,  or 
include  the  changes  in  the  next  release.  However,  the 
authors  did  not  relate  these  metrics  to  process  stability. 

Although  there  were  similarities  between  these  projects 
and  our  research,  our  work  differed  in  that  we  integrated: 
1)  maintenance  actions,  2)  reliability,  3)  test  effort,  and  4) 
risk  to  the  safety  of  mission  and  crew  of  deploying  the 
software  after  maintenance  actions,  for  the  purpose  of 
analyzing  and  evaluating  the  stability  of  the  maintenance 
process. 

3  Concept  of  Stability 
3.1  Trend  Metrics 

To  gain  insight  into  the  interaction  of  the  maintenance 
process  with  product  metrics  like  reliability,  two  types  of 
metrics  were  analyzed:  trend  and  shape.  Both  types  are 
used  to  assess  and  predict  maintenance  process  stability 
across  (long  term)  and  within  (short  term)  releases  after  the 
software  is  released  and  maintained.  Shape  metrics  are 
described  in  Section  3.2.  By  chronologically  ordering  metric 
values  by  release  date,  we  obtain  discrete  functions  in  time 
that  can  be  analyzed  for  trends  across  releases.  Similarly,  by 
observing  the  sequence  of  metric  values  as  continuous 
functions  of  increasing  test  time,  we  can  analyze  trends 
within  releases.  These  metrics  are  defined  as  empirical  and 
predicted  functions  that  are  assigned  values  based  on 
release  date  (long  term)  or  test  time  (short  term).  When 
analyzing  trends,  we  note  whether  an  increasing  or 
decreasing  trend  is  favorable  [15].  For  example,  an  increas¬ 
ing  trend  in  Time  to  Next  Failure  and  a  decreasing  trend  in 
Failures  per  KLOC  would  be  favorable.  Conversely,  a 
decreasing  trend  in  Time  to  Next  Failure  and  an  increasing 
trend  in  Failures  per  KLOC  would  be  unfavorable.  A 
favorable  trend  is  indicative  of  maintenance  stability  if  the 
functionality  of  the  software  has  increased  with  time  across 
releases  and  within  releases.  Increasing  functionality  is  the 
norm  in  software  projects  due  to  the  enhancement  that 
users  demand  over  time.  We  impose  this  condition  because 
if  favorable  trends  are  observed,  they  could  be  the  result  of 
decreasing  functionality  rather  than  having  achieved  main¬ 
tenance  stability.  When  trends  in  these  metrics  over  time  are 
favorable  (e.g.,  increasing  reliability),  we  conclude  that  the 
maintenance  process  is  stable  with  respect  to  the  software 
metric  (reliability).  Conversely,  when  the  trends  are 
unfavorable  (e.g.,  decreasing  reliability),  we  conclude  that 
process  is  unstable.  Our  research  investigated  whether  there 
were  relationships  among  the  following  factors:  1)  main¬ 
tenance  actions,  2)  reliability,  and  3)  test  effort.  We  use  the 
following  types  of  trend  metrics: 


176 


SCHNEIDEWIND:  MEASURING  AND  EVALUATING  MAINTENANCE  PROCESS  USING 


RELIABILITY,  RISK,  AND  TEST  METRICS 


1.  Maintenance  actions:  KLOC  Change  to  the  Code  (i.e., 
amount  of  code  changed  necessary  to  add  given 
functionality); 

2.  Reliability:  Various  reliability  metrics  (e.gv  MTTF, 
Total  Failures,  Remaining  Failures,  and  Time  to  Next 
Failure);  and 

3.  Test  effort:  Total  Test  Time. 

3. 1. 1  Change  Metric 

Although  looking  for  a  trend  metric  on  a  graph  is  useful,  it 
is  not  a  precise  way  of  measuring  stability,  particularly  if 
the  graph  has  peaks  and  valleys  and  the  measurements  are 
made  at  discrete  points  in  time.  Therefore,  we  developed  a 
Change  Metric  (CM),  which  is  computed  as  follows: 

1.  Note  the  change  in  a  metric  from  one  release  to  the 
next  (i.e.,  release  j  to  release  j  +  1). 

2.  If  the  change  is  in  the  desirable  direction  (e.g., 
Failures/KLOC  decrease),  treat  the  change  in  1  as 
positive.  If  the  change  is  in  the  undesirable  direction 
(e-gv  Failures/KLOC  increase),  treat  the  change  in  1 
as  negative. 

3.  If  the  change  in  1  is  an  increase,  divide  it  by  the 
value  of  the  metric  in  release  j  +  1.  If  the  change  in  1 
is  a  decrease,  divide  it  by  the  value  of  the  metric  in 
release  j. 

4.  Compute  the  average  of  the  values  obtained  in  3, 
taking  into  account  sign.  This  is  the  change  metric 
(CM).  The  CM  is  a  quantity  in  the  range  -1,1.  A 
positive  value  indicates  stability;  a  negative  value 
indicates  instability.  The  numeric  value  of  CM 
indicates  the  degree  of  stability  or  instability.  For 
example,  0.1  would  indicate  marginal  stability  and 
0.9  would  indicate  high  stability.  Similarly,  -0.1 
would  indicate  marginal  instability  and  -0.9  would 
indicate  high  instability.  The  standard  deviation  of 
these  values  can  also  be  computed.  Note  that  CM 
only  pertains  to  stability  or  instability  zvith  respect  to 
the  particular  metric  that  has  been  evaluated  (e.g., 
Failures/KLOC).  The  evaluation  of  stability  should 
be  made  with  respect  to  a  set  of  metrics  and  not  a 
single  metric.  The  average  of  the  CM  for  a  set  of 
metrics  can  be  computed  to  obtain  an  overall  metric 
of  stability. 

3.2  Shape  Metrics 

In  addition  to  trends  in  metrics,  the  shapes  of  metric  functions 
provide  indicators  of  maintenance  stability.  We  use  shape 
metrics  to  analyze  the  stability  of  an  individual  release  and 
the  trend  of  these  metrics  across  releases  to  analyze  long-term 
stability.  The  rationale  of  these  metrics  is  that  it  is  better  to 
reach  important  points  in  the  growth  of  product  reliability 
sooner  than  later.  If  we  reach  these  points  late  in  testing,  it  is 
indicative  of  a  process  that  is  late  in  achieving  stability.  We 
use  the  following  types  of  shape  metrics: 

h  Direction  and  magnitude  of  the  slope  of  a  metric 
function  (e.g.,  failure  rate  decreases  asymptotically 
with  total  test  time).  Using  failure  rate  as  an  example 
within  a  release,  it  is  desirable  that  it  rapidly 


decrease  toward  zero  with  increasing  total  test  time 
and  that  it  have  small  values. 

2.  Percent  of  total  test  time  at  which  a  metric  function 
changes  from  unstable  (e.gv  increasing  failure  rate) 
to  stable  (e.g.,  decreasing  failure  rate)  and  remains 
stable.  Across  releases,  it  is  desirable  that  the  total 
test  time  at  which  a  metric  function  becomes  stable 
gets  progressively  smaller. 

3.  Percent  of  total  test  time  at  which  a  metric  function 
increases  at  a  maximum  rate  in  a  favorable  direction 
(e.g.,  failure  rate  has  maximum  negative  rate  of 
change).  Lsing  failure  rate  as  an  example,  it  is 
desirable  for  it  to  achieve  maximum  rate  of  decrease 
as  soon  as  possible,  as  a  function  of  total  test  time. 

4.  Test  time  at  which  a  metric  function  reaches  its 
maximum  value  (e.g.,  test  time  at  which  failure  rate 
reaches  its  maximum  value).  Using  failure  rate  as  an 
example,  it  is  desirable  for  it  to  reach  its  maximum 
value  (i.e.,  transition  from  unstable  to  stable)  as  soon 
as  possible,  as  a  function  of  total  test  time. 

5.  Risk:  Probability  of  not  meeting  reliability  and  safety 
goals  (e.g.,  time  to  next  failure  should  exceed 
mission  duration),  using  various  shape  metrics  as 
indicators  of  risk.  Risk  would  be  low  if  the 
conditions  in  1-4  above  obtain. 

3.3  Metrics  for  Long-Term  Analysis 

We  use  certain  metrics  only  for  long-term  analysis.  As  an 
example,  we  compute  the  following  trend  metrics  over  a 
sequence  of  releases: 

1.  Mean  Time  to  Failure  (MTTF). 

2.  Total  Failures  normalized  by  KLOC  Change  to  the 
Code. 

3.  Total  Test  Time  normalized  bv  KLOC  Change  to  the 
Code. 

4.  Remaining  Failures  normalized  by  KLOC  Change  to 
the  Code. 

5.  Time  to  Next  Failure. 

3.4  Metrics  for  Long-  and  Short-Term  Analysis 

We  use  other  metrics  for  both  long-term  and  short-term 
analysis.  As  an  example,  we  compute  the  following  trend 
(1)  and  shape  (2,  3,  4,  and  5)  metrics  over  a  sequence  of 
releases  and  within  a  given  release: 

1.  Percent  of  Total  Test  Time  required  for  Remaining 
Failures  to  reach  a  specified  value. 

2.  Degree  to  which  Failure  Rate  asymptotically  ap¬ 
proaches  zero  with  increasing  Total  Test  Time. 

3.  Percent  of  Total  Test  Time  required  for  Failure  Rate 
to  become  stable  and  remain  stable. 

4.  Percent  of  Total  Test  Time  required  for  Failure  Rate 
to  reach  maximum  decreasing  rate  of  change  (i.e., 
slope  of  the  failure  rate  curve). 

5.  Maximum  Failure  Rate  and  Total  Test  Time  where 
Failure  Rate  is  maximum. 

4  Data  and  Example  Application 

We  use  the  NASA  Space  Shuttle  application  to  illustrate  the 
concepts.  This  large  maintenance  project  has  been  evolving 
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.  TABLE  1 

Characteristics  of  Maintained  Software  Across  NASA  Space  Shuttle  Releases  (Part  1) 


Operational 

increment 

Reieese 

Date 

Launch 

Dale 

Mission 

Dur^icn 

(Da^ 

Reliability 

Prediction 

Date 

Total 

Post 

Delivery 
Fai  1  urea 

Failure 

Severity 

A 

No  FI  ictus 

6 

One  2 

Five  3 

8/30/84 

6 

8/14/84 

10 

Two  2 

Eight  3 

c 

4/12*85 

7 

1/17/85 

:  10 

i 

Two  2 

Seven  3 

One4 

D 

11/26/85 

7 

10/22/85 

12 

Five  2 

Seven  3 

E 

1/12/86 

6 

5/11/89 

5 

One  2 

Four  3 

F 

12' 17/85 

i  2 

i 

■ 

G 

6/5/87 

3 

hi 

icmaa 

-  . ... 

3 

1 

6/29/89 

1 

3 

Three3 

9 

7/19/91 

7 

Seven  3 

*  5/2/91 

1 

nu 

L 

6/15/92 

3 

One  1 

One  2 

One  3 

M 

7/15*93 

i 

1 

One  3 

N  7/13/S4 

1 

One  3 

o 

10/18/95 

11/19/96 

18 

9/26/S  6 

i 

! 

5 

One  2 

Four  3 

P 

7/18*96 

3 

One  2 

Two  3 

U  3'5/97 

1 

One  3 

with  increasing  functionality  since  1983  [2].  We  use  data 
collected  from  the  developer  of  the  flight  software  of  the 
NASA  Space  Shuttle,  as  shown  in  Table  1,  Part  1,  and 
Table  2,  Part  2.  These  tables  show  Operational  Increments 
(OIs)  of  the  NASA  Space  Shuttle:  OIA...  OIQ,  covering  the 
period  1983-1997.  We  define  an  OI  as  follows:  a  software 
system  comprised  of  modules  and  configured  from  a  series 
of  builds  to  meet  NASA  Space  Shuttle  mission  functional 
requirements  [16].  In  Part  1,  for  each  of  the  OIs,  we  show  the 
Release  Date  (the  date  of  release  by  the  contractor  to 
NASA),  Total  Post  Delivery  Failures,  and  Failure  Severity 
(decreasing  in  severity  from  "1"  to  "4").  In  Part  2,  we  show 
the  maintenance  change  to  the  code  in  KLOC  (source 
language  changes  and  additions)  and  the  total  test  time  of 


the  OL  In  addition,  for  those  OIs  with  at  least  two  failures, 
we  show  the  computation  of  MTTF,  Failures/KLOC,  and 
Total  Test  Time  /KLOC.  KLOC  is  an  indicator  of  main¬ 
tenance  actions,  not  functionality  [8].  Increased  function¬ 
ality,  as  measured  by  the  increase  in  the  size  of  principal 
functions  loaded  into  mass  memory,  has  averaged  about 
2  percent  over  the  last  10  OIs.  Therefore,  if  a  stable  process 
were  observed,  it  could  not  be  attributed  to  decreasing 
functionality.  Also  to  be  noted  is  that  the  software 
developer  is  a  CMM  Level  5  organization  that  has 
continually  improved  its  process. 

Because  the  flight  software  is  run  continuously,  around 
the  clock,  in  simulation,  test,  or  flight,  Total  Test  Time  refers 
to  continuous  execution  time  from  the  time  of  release.  For 
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^  TABLE  2 

Characteristics  of  Maintained  Software  Across  NASA  Space  Shuttle  Releases  (Part  2) 


Operational 

Increment 

KLOC 

Change 

Total  Test 
Time 
(Days) 

MTTF 

(Days) 

Total 

FailuresKLOC 

Change 

Totd 

Test  Tima/ 
KLOC 
Change 
(Days) 

A 

8.0 

1078 

179.7 

0.750 

134.8 

B 

11.4 

4026 

409.6 

0.877 

359.3 

C 

5.9 

4060 

406.0 

1.695 

688.1 

D 

12,2 

2307 

192.3 

0.984 

189.1 

E 

8.8 

1873 

374.6 

0.568 

212.8 

F 

6.6 

412 

206.0 

0.303 

62.4 

G 

6.3 

3077 

1025.7 

0.476 

488.4 

H 

7.0 

540 

180.0 

0.429 

77.1 

t 

’  12.1 

2632 

877.3 

0.248 

217.5 

J 

29.4 

515 

73.6 

0.238 

17.5 

K 

21.3 

182 

8.5 

L 

34.4 

1337 

445.7 

0.087 

38.9 

M 

24.0 

386 

16.1 

N 

10.4 

121 

11.6 

O 

15.3 

344 

68.8 

0.327 

22.5 

P 

7.3 

272 

90.7 

0.411 

37.3 

Q 

11.0 

75 

6.8 

OIs  where  there  was  a  sufficient  sample  size  (i.e.,  Total  Post 
Delivery  Failures)— OI A,  OIB,  OIC,  OID,  OIE,  OIJ,  and 
OIO — we  predicted  software  reliability.  For  these  OIs,  we 
show  Launch  Date,  Mission  Duration,  and  Reliability 
Prediction  date  (i.e.,  the  date  when  we  made  a  prediction). 
Fortunately,  for  the  safety  of  the  crew  and  mission,  there 
have  been  few  postdelivery  failures.  Unfortunately,  from 
the  standpoint  of  prediction,  there  is  a  sparse  set  of 
observed  failures  from  which  to  estimate  reliability  model 
parameters,  particularly  for  recent  OIs.  Nevertheless,  we 
predict  reliability  prior  to  launch  date  for  OIs  with  as  few  as 
five  failures  spanning  many  months  of  maintenance  and 
testing.  In  the  case  of  OIE,  we  predict  reliability  after  launch 
because  no  failures  had  occurred  prior  to  launch  to  use  in 
the  prediction  model.  Because  of  the  scarcity  of  failure  data, 
we  made  predictions  using  all  severity  level's  of  failure  data. 
This  turns  out  to  be  beneficial  when  making  reliability"  risk 


assessments  using  number  of  Remaining  Failures.  For 
example,  rather  than  specifying  that  the  number  of 
predicted  Remaining  Failures  must  not  exceed  one  severity 
"1,"  the  criterion  could  specify  that  the  prediction  not 
exceed  one  failure  of  any  type — a  more  conservative 
criterion  [16]. 

As  would  be  expected,  the  number  of  predelivery 
failures  is  much  greater  than  the  number  of  postdelivery 
failures  because  the  software  is  not  as  mature  from  a 
reliability  standpoint.  Thus,  a  way  around  the  insufficient 
sample  size  of  recent  OIs  for  reliability  prediction  is  to  use 
predelivery  failures  for  model  fit  and  then  use  the  fitted 
model  to  predict  postdelivery  failures.  However,  we  are  not 
sure  that  this  approach  is  appropriate  because  the  multiple 
builds  in  which  failures  can  occur  and  the  test  strategies 
used  to  attempt  to  crash  various  pieces  of  code  during  the 
predelivery  process  contrast  sharply  with  the  postdelivery 
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environment  of  testing  an  integrated  OI  with  operational 
scenarios.  Nevertheless,  we  are  experimenting  with  this 
approach  in  order  to  evaluate  the  prediction  accuracy.  The 
results  will  be  reported  in  a  future  paper. 

5  Relationship  between  Maintenance, 
Reliability,  Risk,  and  Test  Effort 

5.1  Metrics  for  Long-Term  Analysis 

We  want  our  maintenance  effort  to  result  in  increasing 
reliability  of  software  over  a  sequence  of  releases.  A  graph 
of  this  relationship  over  calendar  time  and  the  accompany¬ 
ing  CM  calculations  indicate  whether  the  long-term  main¬ 
tenance  effort  has  been  successful  as  it  relates  to  reliability. 
In  order  to  measure  whether  this  is  the  case,  we  use  both 
predicted  and  actual  values  of  metrics.  We  predict 
reliability  in  advance  of  deploying  the  software.  If  the 
predictions  are  favorable,  we  have  confidence  that  the  risk 
is  acceptable  to  deploy  the  software.  If  the  predictions  are 
unfavorable,  we  may  decide  to  delay  deployment  and 
perform  additional  inspection  and  testing.  Another  reason 
for  making  predictions  is  to  assess  whether  the  maintenance 
process  is  effective  in  improving  reliability  and  to  do  it 
sufficiently  early  during  maintenance  to  improve  the 
maintenance  process.  In  addition  to  making  predictions, 
we  collected  and  analyzed  historical  reliability  data.  These 
data  show  in  retrospect  whether  maintenance  actions  were 
successful  in  increasing  reliability.  In  addition,  the  test 
effort  should  not  be  disproportionate  to  the  amount  of  code 
that  is  changed  and  to  the  reliability  that  is  achieved  as  a 
result  of  maintenance  actions. 

5.1.1  Mean  Time  to  Failure 

We  want  Mean  Time  to  Failure  (MTTF),  as  computed  b}7  (1), 
to  show  an  increasing  trend  across  releases,  indicating 
increasing  reliability. 

Mean  Time  to  Failure  =  Total  Tost  Time/Total 

Number  of  Failures  During  (1) 
Test 


Maltha  Since  F.frlaase  oT  First  OI 

or  A  B  C  D  E  ]  O 
Fig.  1 .  Mean  time  to  failure  across  releases. 

5.1.2  Total  Failures 

Similarly,  we  want  Total  Failures  (and  faults),  normalized 
by  KLOC  Change  in  Code,  as  computed  by  (2),  to  show  a 
decreasing  trend  across  releases,  indicating  that  reliability 
is  increasing  with  respect  to  code  changes. 

Total  Failures /KLOC  =  Total  Number  of  Failures 

During  Test/KLOC  Change  (2) 
in  Code  on  the  OI 

We  plot  (1)  and  (2)  in  Fig.  1  and  Fig.  2,  respectively, 
against  Release  Time  of  OI.  This  is  the  number  of  months 
since  the  release  of  the  OI,  using  "0"  as  the  release  time  of 
OIA.  We  identify  the  OIs  at  tine  bottom  of  the  plots.  Both  of 
these  plots  use  actual  values  (i.e.,  historical  data).  The  CM 
value  for  (1)  is  —0.060  indicating  small  instability  with 
respect  to  MTTF  and  0.087  for  (2)  indicating  small  stability 
with  respect  to  normalized  Total  Failures.  The  correspond¬ 
ing  standard  deviations  are  0.541  and  0.442.  Large  varia¬ 
bility  in  CM  is  the  case  m  this  application  due  to  the  large 
variability  in  functionality*  across  releases.  Furthermore,  it  is 
not  our  objective  to  judge  the  process  that  is  used  in  this 
example.  Rather,  our  purpose  in  showing  these  and 
subsequent  values  of  CM  is  to  illustrate  our  model.  We 
use  these  plots  and  the  CM  to  assess  the  long-term  stability 
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Fig.  2.  Total  failures  per  KLOC  across  releases. 
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of  the  maintenance  process.  We  show  example  computa¬ 
tions  of  CM  for  (1)  and  (2)  in  Table  3. 

5. 1.3  Total  Test  Time 

We  want  Total  Test  Time,  normalized  by  KLOC  Change  in 
Code,  as  computed  by  (3),  to  show  a  decreasing  trend 
across  releases,  indicating  that  test  effort  is  decreasing  with 
respect  to  code  changes. 

Total  Test  Time /KLOC  =  Total  Test  Time/KLOC 

Change  in  Code  on  the  01.  V  J 

We  plot  (3)  in  Fig.  3  against  Release  Time  of  OI,  using 
actual  values.  The  CM  value  for  this  plot  is  0,116,  with  a 
standard  deviation  of  0.626,  indicating  stability  with  respect 
to  efficiency  of  test  effort.  We  use  this  plot  and  the  CM  to 
assess  whether  testing  is  efficient  with  respect  to  the 
amount  of  code  that  has  been  changed. 

5.2  Reliability  Predictions 

5.2 . 1  Total  Failures 

Up  to  this  point,  we  have  used  only  actual  data  in  the 
analysis.  Now  we  expand  the  analysis  to  use  both 
predictions  and  actual  data  but  only  for  the  seven  OIs 
where  we  could  make  predictions.  Using  the  Schneidewind 
Model  [1],  [9],  [16],  [17],  [18]  and  the  SMERFS  software 
reliability  tool  [4],  we  show  prediction  equations,  using 


30  day  time  intervals,  and  make  predictions  for  OIA,  OFB, 
OIC,  OID,  OIE,  OIJ,  and  OIO.  This  model  or  any  other 
applicable  model  may  be  used  [1],  [4]. 

To  predict  Total  Failures  in  the  range  (1,  oo]  (i.e.,  failures 
over  the  life  of  the  software),  we  use  (4); 

F{  oo)  =  ct/p  +  Xs„  i  (4) 

where  the  terms  are  defined  as  follows: 

s:  starting  time  interval  for  using  failures  counts  for 
computing  parameters  a  and  0, 
a:  initial  failure  rate, 

,6:  rate  of  change  of  failure  rate,  and 

Xs-L:  observed  failure  count  in  the  range  [1  ,s  -  1] 

Now,  we  predict  Total  Failures  normalized  by  KLOC 
Change  in  Code.  We  want  predicted  normalized  Total 
Failures  to  show  a  decreasing  trend  across  releases.  We 
computed  a  CM  value  for  this  data  of  0.115,  with  a 
standard  deviation  of  0.271,  indicating  stability  with  respect 
to  predicted  normalized  Total  Failures. 

5.2.2  Remaining  Failures 

To  predict  Remaining  Failures  r(t)  at  time  t,  we  use  (5)  [1], 
PM17]: 

r(f)  =  F{ oo)  -  Xt  (5) 


TABLE  3 

Example  Computations  of  Change  Metric  (CM) 


Operational 

Increment 

M1TF 

(Days) 

Relative 

Change 

Tola! 

Failuros/KLOC 

Relative 

Change 

A 

179.7 

0.750 

B 

409.6 

0.562 

0.877 

-0.145 

C 

406.0 

-0.007 

1.695 

-0.483 

D 

192.3 

-0.527 

0.984 

0.419 

E 

374.6 

0.487 

0.568 

0.423 

j 

73.6 

-0.805 

0.238 

0.581 

0 

68.8 

-0.068 

0.330 

-0.272 

CM 

-0.060 

CM 

0.087 

Fig.  3.  Total  test  time  per  KLOC  across  releases. 
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Fig.  4.  Reliability  of  maintained  software— remaining  failures  normalized 
by  change  to  code. 

This  is  the  predicted  Total  Failures  over  the  life  of  the 
software  minus  the  observed  failure  count  at  time  t 
We  predict  Remaining  Failures,  normalize  them  by 
KLOC  Change  in  Code,  and  compare  them  with  normalized 
actual  Remaining  Failures  for  seven  OIs  in  Fig.  4.  We 
approximate  Actual  Remaining  Failures  at  time  t  by 
subtracting  the  observed  failure  count  at  time  t  from  the 
observed  Total  Failure  count  at  time  T,  where  T  »  t.  The 
reason  for  this  approach  is  that  we  are  approximating  the 
failure  count  over  the  life  'ty  12of  the  software  by  using  the 
failure  count  at  time  T.  We  want  (5)  and  actual  Remaining 
Failures,  normalized  by  KLOC  Change  in  Code,  to  show  a 
decreasing  trendover  a  sequence  of  releases.  The  CM  values 
for  these  plots  are  0.107  and  0.277,  respectively,  indicating 
stability  with  respect  to  Remaining  Failures.  The  corre¬ 
sponding  standard  deviations  are  0.617  and  716. 

5.2.3  Time  to  Next  Failure 

To  predict  the  Time  for  the  Next  Ft  Failures  to  occur,  when 
the  current  time  is  t,  we  use  (6)  [1],  [16],  [17]. 

TF(t)  «  [(lo  g[a/(o  -  I3(XSJ  +  Ft))])/f3\  -  (t -iTi)  (6) 

The  terms  in  Tp(t )  have  the  following  definitions: 
t  :  Current  time  interval; 

X3y.  Observed  failure  count  in  the  range  [s,  t\;  and 
Ft:  Given  number  of  failures  to  occur  after  interval  t 
(e.g.,  one  failure). 


Predicted  (vl)  |  Acil a;  ;Y2) 


3.4  9.27  13.17  17.5  81.6 

Months  Since  Release  of  First  01 


OI  A  B  CD  E  J 

Fig.  5.  Reliability  of  maintained  software — time  to  next  failure. 

We  want  (6)  to  show  an  increasing  trend  over  a  sequence 
of  releases.  Predicted  and  actual  values  are  plotted  for  six 
OIs  (OIO  has  no  failures)  in  Fig.  5.  The  CM  values  for  these 
plots  are  -0.152  and  -0.065,  respectively,  indicating  slight 
instability  with  respect  to  time  to  next  failure.  The 
corresponding  standard  deviations  aTe  0.693  and  0.630. 

We  predicted  values  of  Total  Failures,  Remaining  Fail¬ 
ures,  and  Time  to  Next  Failure  as  indicators  of  the  risk  of 
operating  software  in  the  future:  Is  the  predicted  future 
reliability  of  software  an  acceptable  risk?  The  risk  to  the 
mission  may  or  may  be  not  be  acceptable.  If  the  latter,  we 
take  action  to  improve  the  maintained  product  or  the 
maintenance  process.  We  use  actual  values  to  measure  the 
reliability  of  software  and  the  risk  of  deploying  it  resulting 
from  maintenar.ee  actions. 

5.3  Summary 

We  summarize  change  metric  values  in  Table  4.  Overall 
(i.e.,  average  CM),  the  values  indicate  marginal  stability.  If 
the  majority  of  the  results  and  the  average  CM  were 
negative,  this  would  be  an  alert  to  investigate  the  cause.  The 
results  could  be  caused  by:  1)  greater  functionality  and 
complexity  in  the  software  over  a  sequence  of  releases,  2)  a 
maintenance  process  that  needs  to  be  improved,  or  3)  a 
combination  of  these  causes. 


TABLE  4 

Change  Metric  Summary 


Metric 

Actual  Predicted 

Mean  Time  To  Failure 

-0.060  ! 

Total  Test  Time  per  KLOC 

0.116 

Total  Failures  per  KLOC 

0.087 

0.115 

Remaining  Failures  per  KLOC 

0.277 

0.107 

Time  to  Next  Failure 

-0.065 

-0.152 

Average 

0.071 
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0  1  2  3  4  5  c 


Number  of  Remaining  Failures 


Fig.  6.  Total  test  time  to  achieve  remaining  failures. 


6  Metrics  for  Long-Term  and  Short-Term 
Analysis 

In  addition  to  the  long-term  maintenance  criteria,  it  is 
desirable  that  the  maintenance  effort  results  in  increasing 
reliability  within  each  release  or  OI.  One  way  to  evaluate 
how  well  we  achieve  this  goal  is  to  predict  and  observe  the 
amount  of  test  time  that  is  required  to  reach  a  specified 
number  of  Remaining  Failures.  In  addition,  we  want  the  test 
effort  to  be  efficient  in  finding  residual  faults  for  a  given  OI. 
Furthermore,  number  of  Remaining  Failures  serves  as  an 
indicator  of  the  risk  involved  in  using  the  maintained 
software  (i.e.,  a  high  value  of  Remaining  Failures  portends  a 
significant  number  of  residual  faults  in  the  code).  In  the 
analysis  that  follows  we  use  predictions  and  actual  data  for 
a  selected  OI  to  illustrate  the  process:  OID. 

6.1  Total  Test  Time  Required  for  Specified 
Remaining  Failures 

We  predict  the  Total  Test  Timethat  is  required  to  achieve  a 
specified  number  of  Remaining  Failures,  at  time  tt,  by 
(7)  [I],  [17]: 

tt  =  [log  \a/{0  [r(f,)))||  10  4-  (s  -  1)  (7) 

We  plot  predicted  and  actual  Total  Test  Time  for  OID  in 
Fig.  6  against  given  number  of  Remaining  Failures.  The  two 
plots  have  similar  shapes  and  show  the  typical  asymptotic 
characteristic  of  reliability  (e.gv  Remaining  Failures)  vs. 
Total  Test  Time.  These  plots  indicate  the  possibility  of  big 
gains  in  reliability  in  the  early  part  of  testing;  eventually  the 
gains  become  marginal  as  testing  continues.  The  figure  also 
shows  how  risk  is  reduced  with  a  decrease  in  Remaining 
Failures  that  is  accomplished  with  increased  testing. 
Predicted  values  are  used  to  gauge  how  much  maintenance 
test  effort  would  be  required  to  achieve  desired  reliability 
goals  and  whether  the  predicted  amount  of  Total  Test  Time 
is  technically  and  economically  feasible.  We  use  actual 
values  to  jLidge  whether  the  maintenance  test  effort  has 
been  efficient  in  relation  to  the  achieved  reliability. 


Percent  of  Total  Test  Time 


Fig.  7.  OID  failure  rate. 
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Fig.  8.  OID  rate  of  change  of  failure  rate. 
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Fig.  9.  OID  failure  rate  predicted  vs.  actual. 
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6.2  Failure  Rate 

In  the  short  term  (i.e.,  within  a  release),  we  want  the  Failure 
Rate  (1/MTTF)  of  an  OI  to  decrease  over  an  OI's  Total  Test 
Time,  indicating  increasing  reliability.  Practically,  we 
would  look  for  a  decreasing  trend,  after  an  initial  period 
of  instability  (i.e.,  increasing  rate  as  personnel  learn  how  to 
maintain  new  software).  In  addition,  we  use  various  shape 
metrics,  as  defined  previously,  to  see  how  quickly  we  can 
achieve  reliability  growth  with  respect  to  test  time 
expended.  Furthermore,  Failure  Rate  is  an  indicator  of  the 
risk  involved  in  using  the  maintained  software  (i.e.,  an 
increasing  failure  rate  indicates  an  increasing  probability  of 
failure  with  increasing  use  of  the  software). 


Failure  Rate  =  Total  Number  of  Failures 

During  Test /Total  Test  Time  ® 

We  plot  (8)  for  OID  in  Fig.  7  against  Total  Test  Time 
since  the  release  of  OID.  Fig.  7  does  show  that  short-term 
stability  is  achieved  (i.e.,  failure  rate  asymptotically 
approaches  zero  with  increasing  Total  Test  Time).  In 
addition,  this  curve  shows  when  the  failure  rate  transitions 
from  unstable  (positive  Failure  Rate)  to  stable  (negative 
Failure  Rate).  The  figure  also  shows  how  risk  is  reduced 
with  decreasing  Failure  Rate  as  the  maintenance  process 
stabilizes.  Furthermore,  in  Fig.  8  we  plot  the  rate  of  change 
(i.e.,  slope)  of  the  Failure  Rate  of  Fig.  7.  This  curve  shows 
the  percent  of  Total  Test  Time  when  the  rate  of  change  of 
Failure  Rate  reaches  its  maximum  negative  value.  We  use 
these  plots  to  assess  whether  we  have  achieved  short-term 


TABLE  5 

Percent  of  Total  Test  Time  Required  to  Achieve  Reliability  Goals  and  Change  Metrics  (CM) 


j  Operational 
s  Increment 

! 

i — 

One 

Remaining 

Failure 

(%  Test  Time) 

Relative 

Change 

Stable 
Failure  Rate 
(%  Test.  Time) 

Relative 

Change 

Maximum  Failure 
Rate  Cnangc 
(%  Test  Time) 

Relative 

Change 

! 

i _ a 

77.01 

76.99 

76.99 

;  b  ;  64.i  i 

0.168 

64.11 

C.L67 

64.11 

0.167 

c 

32.36 

0.495 

10.07 

0.843 

10.07 

0.843 

D 

84.56 

-0.617 

12.70 

-0.207 

22.76 

-0.558 

E 

S3. 29 

0.015 

61.45 

-0.793 

61.45 

-0.63C 

J 

76.88 

0.077 

76.89 

-0.201 

76.89 

-0.201 

0 

46.49 

0.395 

100.00 

-0.231 

100.00 

-0.23 1 

CM 

0,089 

CM 

-0.070 

CM 

•0.101 

STD  DEV 

0.392 

STD  DEV 

0.543 

STD  DEV 

0.544 

TABLE  6 

Shuttle  Operational  Increment  Functonaiity 


Operational 

Inclement 

Release 

Date 

KLOC 

Change 

Operational  Increment  Function 

A 

9/1/83 

8.0 

Redesign  of  Main  Engine  Controller. 

B 

12/12/83 

11.4 

Payload  Re-manifest  Capabilities. 

C 

6/8/84 

5.9 

Crew  Enhancements. 

D 

10/5/84 

12.2 

F.xpcii mental  Orbit  Autopilot.  Enhanced  Ground  Checkout. 

n 

2/1 5/85 

8.8 

Western  Test  Range.  Enhance  Propellant  Dumps. 

F 

J  2/17/85 

6.6 

Centaur. 

G 

6/5/87 

6.3 

Post  51-L  (Challenger)  Safety  Changes. 

H 

10/13/88 

7.0 

System  Improvements. 

I 

6/29/89 

12.1 

Abort  Enhancements. 

J 

6/18/90 

29.4 

Extended  rending  Sites.  Trans- Atlantic  Abort  Code  Co- 
Residency. 

K 

5/21/91 

21.3 

Redesigned  Abort  Sequencer. 

One  Engine  Auto  Contingency  Aborts. 

Hardware  Changes  for  New  Orbiier. 

L _ l- 

6/15/92 

34.4 

Abort  Enhancements. 

M 

7/15/93 

24.0 

On-Orbit  Changes. 

1  N 

7/13/94 

10.4 

MIR  Docking.  On-Orbit  Digital  Autopilot  Changes 

O  !  10/18/95 

15.3 

Three  Engine  Out  Auto  Contingency. 

P  1  7/16/96 

7.3 

Performance  Enhancements. 

Q  i  3/5/97 

11.0 

Single  Global  Positioning  System.  1 
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TABLE  7 

Chronology  of  Process  Improvements 


Year  in  which  Process 
Improvement  Introduced 

Process  Improvement 

1976 

Structured  Flows 

1977 

Formal  Software  Inspections  j 

1978 

j 

Formal  Inspection  Moderators  1 

1980 

Formalized  Configuration  Control 

!  1981 

Inspection  Improvements 

1982 

Configuration  Management  Database 

1 983  ,  Oversight  Analyses 

1  Build  Automation 

**84  !  Formalized  Requirements  Analysis  1 

1985 

Quarterly  Quality  Reviews 

Prototyping 

1986 

Inspection  Improvements 

Formal  Requirements  Inspections 

1987 

Process  Applied  to  Support  Software 

198S 

Reconfiguration  Certification 

Reliability  Modeling  and  Prediction 

1989 

Process  Maturity  Measurements 

1990 

Formalized  Training 

1992 

Software  Metrics  j 

stability  in  the  maintenance  process  (i.e.,  whether  Failure 
Rate  decreases  asymptotically  with  increasing  Total  Test 
Time).  If  we  obtain  contrary  results,  tins  would  be  an  alert 
to  investigate  whether  this  is  caused  by:  1)  greater 
functionality  and  complexity  of  the  OI  as  it  is  being 
maintained,  2)  a  maintenance  process  that  needs  to  be 
improved,  or  3)  a  combination  of  these  causes. 

Another  way  of  looking  at  failure  rate  with  respect  to 
stability  and  risk  is  the  annotated  Failure  Rate  of  OID 
shown  in  Fig.  9,  where  we  show  both  the  actual  and 
predicted  Failure  Rates.  We  use  (8)  and  (9)  [1]  to  compute 
the  actual  and  predicted  Failure  Rates,  respectively,  where  i 
is  a  vector  of  time  intervals  for  i  >  s  in  (9). 

/(?:)  =  a(EXP (-£(*•  -  s  4- 1)))  (9) 

A  30-day  interval  has  been  found  to  be  convenient  as  a 
unit  of  NASA  Space  Shuttle  test  time  because  testing  can 
last  for  many  months  or  even  years.  Thus,  this  is  the  unit 
used  in  Fig.  9,  where  we  show  the  following  events  in 


intervals,  where  the  predictions  were  made  at  12.73 
intervals: 

Release  time:  0  interval, 

Launch  time:  13.90  intervals. 

Predicted  time  of  maximum  Failure  Rate:  6.0  intervals. 
Actual  time  of  maximum  Failure  Rate:  7.43  intervals, 
Predicted  maximum  Failure  Rate:  0.5735  failures 
per  interval,  and 

Actual  maximum  Failure  Rate:  0.53S1  failures  per  interval. 

In  Fig.  9,  stability  is  achieved  after  the  maximum  failure 
rate  occurs.  This  is  at  i  =  s  (i.e.  2=6  intervals)  for 
predictions  because  (9)  assumes  a  monotonically  decreasing 
failure  rate,  whereas  the  actual  failure  rate  increases, 
reaches  a  maximum  at  7.43  intervals,  and  then  decreases. 
Once  stability  is  achieved,  risk  decreases. 
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6.3  Summary 

In  addition  to  analyzing  short-term  stability  with  these 
metrics,  we  use  them  to  analyze  long-term  stability  across 
releases.  We  show  the  results  in  Table  5  where  the  percent 
of  Total  Test  Time  to  achieve  reliability  growth  goals  is 
tabulated  for  a  set  of  OIs,  using  actual  failure  data,  and  the 
Change  Metrics  are  computed.  Overall,  the  values  of  CM 
indicate  marginal  instability.  Interestingly,  except  for  OID, 
the  maximum  negative  rate  of  change  of  failure  rate  occurs 
when  Failure  Rate  becomes  stable,  suggesting  that  max¬ 
imum  reliability  growth  occurs  when  the  maintenance 
process  stabilizes. 

7  Space  Shuttle  Operational  Increment 
Functionality  and  Process  Improvement 

Table  6  shows  the  major  functions  of  each  OI  [12]  along 
with  the  Release  Date  and  KLOC  Change  repeated  from 
Table  ]  and  Table  2.  There  is  a  not  a  one-for-one  relation¬ 
ship  between  KLOC  Change  and  the  functionality  of  the 
change  because,  as  stated  earlier,  KLOC  is  an  indicator  of 
maintenance  actions,  not  functionality.  However,  the  soft¬ 
ware  developer  states  that  there  has  been  increasing 
software  functionality  and  complexity  with  each  OI,  in 
some  cases  with  less  rather  than  more  KLOC  [8],  The  focus 
of  the  early  OIs  was  on  launch,  orbit,  and  landing.  Later 
OIs,  as  indicated  in  Table  6,  built  upon  this  baseline 
functionality  to  add  greater  functionality  in  the  form  of  MIR 
docking  and  the  Global  Positioning  System  (GPS),  for 
example.  Table  7  shows  the  process  improvements  that 
have  been  made  over  time  on  this  project,  indicating 
continuous  process  improvement  across  releases. 

The  stability  analysis  that  was  performed  yielded  mixed 
results:  About  half  are  favorable  and  half  are  unfavorable. 
Some  variability  in  the  results  may  be  due  to  gaps  in  the  data 
caused  by  OIs  that  have  experienced  insufficient  failures  to 
permit  statistical  analysis.  Also,  we  note  that  the  values  of  CM 
are  marginal  for  both  the  favorable  and  unfavorable  cases. 
Although  there  is  not  pronounced  stability  neither  is  there 
pronounced  instability.  If  there  were  consistent  and  large 
negative  values  of  CM,  it  would  be  cause  for  alarm  and  would 
suggest  the  need  to  perform  a  thorough  review  of  the  process. 
This  is  not  the  case  for  the  NASA  Space  Shuttle.  We  suspect, 
but  cannot  prove,  that  in  the  absence  of  the  process 
improvements  of  Table  7  the  CM  values  would  look  much 
worse.  It  is  very  difficult  to  associate  a  specific  product 
improvement  with  a  specific  process  improvement.  A 
controlled  experiment  would  be  necessary  to  hold  all  process 
factors  constant  and  observe  the  one  factor  of  interest  and  its 
influence  on  product  quality.  This  is  infeasible  to  do  in 
industrial  organizations.  However,  we  suggest  that  in  the 
a  series  of  process  improvements  is  beneficial  for 
product  quality  and  that  a  set  of  CM  values  can  serve  to 
highlight  possible  process  problems. 

8  Conclusions 

As  stated  in  the  Introduction,  the  authors'  emphasis  in  this 
paper  was  to  propose  a  unified  product  and  process 
measurement  model  for  both  product  evaluation  and 


process  stability’  analysis.  We  were  less  interested  in  the 
results  of  the  NASA  Space  Shuttle  stability  analysis,  which 
was  used  to  illustrate  the  model  concepts.  The  authors 
concluded,  based  on  both  predictive  and  retrospective  use 
of  reliability,  risk,  and  test  metrics,  that  it  is  feasible  to 
measure  and  assess  both  product  quality  and  the  stability  of 
a  maintenance  process.  The  model  is  not  domain  specific. 
Different  organizations  may  obtain  different  numerical 
results  and  trends  than  the  ones  we  obtained  for  the  NASA 
Space  Shuttle. 
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Cost  as  the  Universal  COTS  Metric 

We  focus  on  factors  that  the  user  should  consider 
when  deciding  whether  to  use  COTS  software.  We  take 
the  approach  of  using  the  common  denominator  cost. 
This  is  done  for  two  reasons:  First,  cost  is  obviously  of 
interest  in  making  such  decisions  and  second  a  single 
metric  -  cost  in  dollars  -  can  be  used  for  evaluating  the 
pros  and  cons  of  using  COTS.  The  reason  is  that  various 
software  system  attributes,  like  acquisition  cost  and 
availability  (i.e.,  the  percentage  of  scheduled  operating 
time  that  the  system  is  available  for  use),  are  non- 
commensurate  quantities.  That  is,  we  cannot  relate 
quantitatively  “a  low  acquisition  cost"  with  "high 
availability".  These  units  are  neither  additive  nor 
multiplicative.  However,  if  it  were  possible  to  translate 
availability  into  either  a  cost  gain  or  loss  for  COTS 
software,  we  could  operate  on  these  metrics 
mathematically.  Naturally,  in  addition  to  cost,  the  user 
application  is  key  in  making  the  decision.  Thus  one 
could  develop  a  matrix  where  one  dimension  is 
application  and  the  other  dimension  is  the  various  cost 
elements.  We  show  how  cost  elements  can  be  identified 
and  how  cost  comparisons  can  be  made  over  the  life  of 
the  software.  Obviously,  identifying  the  costs  would  not 
be  easy.  The  user  would  have  to  do  a  lot  of  work  to  set 
up  the  decision  matrix  but  once  it  was  constructed,  it 
would  be  a  significant  tool  in  the  evaluation  of  COTS. 
Furthermore,  even  if  all  the  required  data  cannot  be 
collected,  having  a  framework  that  defines  software 
system  attributes  would  serve  as  a  user  guide  for  factors 
to  consider  when  making  the  decision  about  whether  to 
use  COTS  software  or  in-house  developed  software. 

Certainly,  different  applications  would  have  varying 
degrees  of  relationships  with  the  cost  elements.  For 
example,  flight  control  software  would  have  a  stronger 
relationship  with  the  cost  of  unavailability  than  a 
spreadsheet  application.  Conversely,  the  latter  would 
have  a  stronger  relationship  with  the  cost  of  inadequacy 
of  tool  features  than  the  former.  Due  to  the  difficulty  of 
identifying  specific  COTS-related  costs,  our  initial 
approach  is  to  identify  cost  elements  on  the  ordinal  scale. 
Thus,  the  first  version  of  the  decision  matrix  would 
involve  ordinal  scale  metrics  (i.e.,  the  cost  of 


unreliability  is  more  important  for  flight  control  software 
than  for  spreadsheet  applications).  As  the  field  of  COTS 
analysis  matures  and  as  additional  data  is  collected  about 
the  cost  of  using  COTS,  we  will  be  able  to  refine  our 
metrics  to  the  ratio  scale  (e.g.,  the  cost  of  unreliability  in 
COTS  systems  is  two  times  that  in  custom  systems). 

The  cost  elements  for  comparing  COTS  software 
with  in-house  software  are  identified  below.  This  list  is 
not  exhaustive;  its  purpose  is  to  illustrate  the  approach. 
These  elements  apply  whether  we  are  comparing  a 
system  comprised  of  all  COTS  components  with  all  in- 
house  components  or  comparing  only  a  subset  of  COTS 
components  with  corresponding  in-house  components. 
Explanatory  comments  are  made  where  necessary.  Mean 
values  are  used  for  some  quantities  in  the  initial 
framework.  This  is  the  case  because  it  will  be  a  challenge 
to  collect  any  data  for  some  applications.  Therefore,  the 
initial  framework  should  not  be  overly  complex. 
Variance  and  statistical  distribution  information  could  be 
included  as  enhancements  if  the  initial  framework  proves 
successful. 

Cost  Elements 

Cc(j)  =  Cost  of  acquiring  COTS  software  in  year  j. 

QO)  ”  Cost  of  developing  in-house  software  in  year  j. 

Uc(j)  =  Cost  of  upgrading  COTS  software  in  year  j. 

U,(j)  -  Cost  of  upgrading  in-house  software  in  year  j. 

P(j)  =  Cost  of  personnel  who  use  the  software  system  in 
year  j.  This  quantity  represents  the  value  to  the  customer 
of  using  the  software  system. 

Mc(j)  =  Cost  per  unit  time  of  repairing  a  fault  in  COTS 
software  in  year  j.  This  is  the  cost  of  customer  time 
involved  in  resolving  a  problem  with  the  vendor. 

Mi(j)  =  Cost  per  unit  time  of  repairing  a  fault  in  in-house 
software  in  year  j. 
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R^G)  =  Mean  time  of  repairing  a  fault  that  causes  a 
failure  in  COTS  software  in  year  j.  This  is  the  average 
time  that  the  user  spends  in  resolving  a  problem  with  the 
vendor. 

Rj(j)  =  Mean  time  of  repairing  a  fault  that  causes  a 
failure  in  in-house  software  in  year  j. 

T(j)  =  Scheduled  operating  time  for  the  software  system 
in  year  j. 

Ac(j)  =  Availability  of  software  system  that  uses  COTS 
software  in  year  j. 

Aj(j)  =  Availability  of  software  system  that  uses  software 
developed  in-housc  in  year  j. 

These  quantities  are  the  fractions  of  T(j)  that  the  software 
system  is  available  for  use. 

Fc0)  =  Failure  rate  of  COTS  software  in  year  j. 

Fj(j)  =  Failure  rate  of  COTS  software  in  year  j. 

These  quantities  are  the  number  of  failures  per  year  that 

•  cause  loss  of  productivity  and  availability  of  the  software 
system. 

In  some  applications,  some  or  all  of  the  above 
quantities  may  be  known  or  assumed  to  be  constant  over 

•  the  life  of  the  software  system.  Using  the  above  cost 
i. elements,  wc  derive  the  equations  for  the  annual  costs  of 

the  two  systems  and  the  difference  in  these  costs.  In  the 
.cost  difference  calculations  that  follow,  a  positive 
quantity  is  favorable  to  in-house  development  and  a 
•negative  quantity  is  favorable  to  COTS. 

Cost  of  Acquiring  Software 

Difference  in  annual  cost  =  Cc(j)  -  Q(j)  ( 1 ) 

Cost  of  Upgrading  Software 

Difference  in  annual  cost  =  Uc(j)  -  U*G)  (2) 


Cost  of  Software  being  Unavailable  for  Use 

Annual  cost  of  COTS  software  being  unavailable  for  use 
=  (l-AcG))*  P(j). 

Annual  cost  of  the  in-house  software  being  unavailable 
for  use 

=  (1-Ai(j))*  P(j). 

Difference  in  annual  cost  =  P(j)  *  (A;(j)  -  Ac(j))  (3) 

Cost  of  Repairing  Software 

Average  annual  cost  of  repairing  failed  COTS  software  = 

FcOVTO)*RcU)*Mc(j). 

Average  annual  cost  of  repairing  failed  in-house  software 
=  Fi(j)  *T(j)  *  R;(j)  *Mi(j). 

Difference  in  annual  cost  = 

TO)  *  <(Fc(j)  *  RCG)  *  Mc(j))  -  ((Fj(j)  *  R,(j)  *  MjG))  (4) 

Then,  TCj,  total  difference  in  cost  in  year  j,  is  the  sum  of 
(1),  (2),  (3),  and  (4).  Because  there  is  the  opportunity  to 
invest  funds  in  alternate  projects,  costs  in  different  years 
are  not  equivalent  (i.e.,  funds  available  today  have  more 
value  than  an  equal  amount  in  the  future  because  they 
could  be  invested  today  and  earn  a  future  return). 
Therefore,  a  stream  of  costs  over  the  life  of  the  software 
for  n  years  must  be  discounted  by  k,  the  rate  of  return  on 
alternate  use  of  funds.  Thus  the  total  discounted  cost 
differential  between  COTS  software  and  in-house 
software  is: 

i:TC/(l  +  k)J 

In  this  initial  formulation,  we  have  not  included 
possible  differences  in  functionality  between  the  two 
approaches.  However,  a  reasonable  assumption  is  that 
COTS  software  would  not  be  considered  unless  it  could 
provide  minimum  functionality  to  satisfy  user 
requirements.  Thus,  a  typical  decision  for  the  user  is 
whether  it  is  worth  the  additional  life  cycle  costs  to 
develop  an  in-house  software  system  with  all  the 
desirable  attributes. 
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Abstract 

We  develop  a  new  metric,  Relative  Critical  Value  De¬ 
viation  (RCVD),  for  classifying  and  predicting  software 
quality.  The  RCVD  is  based  on  the  concept  that  the  extent 
to  which  a  metric's  value  deviates  from  its  critical  value, 
normalized  by  the  scale  of  the  metric,  indicates  the  degree 
to  which  the  item  being  measured  does  not  conform  to  a 
specified  norm.  For  example,  the  deviation  in  body  tem¬ 
perature  above  98.6  Fahrenheit  degrees  is  a  surrogate  for 
fever.  Similarly,  the  RCVD  is  a  surrogate  for  the  extent  to 
which  the  quality  of  software  deviates  from  acceptable 
norms  (e.g.,  zero  discrepancy  reports).  Early  in  develop¬ 
ment,  surrogate  metrics  are  needed  to  make  predictions  of 
quality  before  quality  data  are  available.  The  RCVD  can 
be  computed  for  a  single  metric  or  multiple  metrics.  Its 
application  is  in  assessing  newly  developed  modules  by 
their  quality  in  the  absence  of  quality  data.  The  RCVD  is  a 
part  of  the  larger  framework  of  our  measurement  models 
that  include  the  use  of  Boolean  Discriminant  Functions 
for  classifying  software  quality.  We  demonstrate  our  con¬ 
cepts  using  Space  Shuttle  flight  software  data. 

Keywords:  Quality  classification  and  prediction,  relative 
critical  value  deviation  metrics. 

1.  Introduction 

Our  goal  is  to  provide  models  and  processes  to  assist 
software  managers  in  answering  the  following  questions: 

•  How  can  I  control  the  quality  of  my  software? 

•  How  can  I  predict  the  quality  of  my  software? 

•  How  shall  I  prioritize  my  effort  to  achieve  my  quality 
goals? 

•  How  can  I  determine  whether  my  quality  goals  are 
being  met? 

•  How  much  will  it  cost  to  achieve  my  quality  goals? 

We  develop  quality  control  and  prediction  models  that  are 
used  to  identify  modules  requiring  priority  attention  dur- 
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ing  development  and  maintenance.  This  is  accomplished 
in  two  activities:  validation  and  application.  During  vali¬ 
dation,  we  use  a  build  of  the  software  that  has  been  devel¬ 
oped  as  the  source  of  data  to  compute  Boolean  Discrimi¬ 
nant  Functions  (BDFs),  Relative  Critical  Value  Deviation 
(RCVD)  metrics,  and  regression  equations  that  we  use  to 
retrospectively  classify  and  predict  quality  with  specified 
accuracy,  by  build  and  module.  Using  these  functions  and 
equations  during  application,  we  classify  and  predict  the 
quality  of  new  software  that  is  being  developed.  This  is 
the  quality  we  expect  to  achieve  during  maintenance. 
During  validation,  both  quality  factor  (e.g.,  discrepancy 
reports  of  deviations  between  requirements  and  imple¬ 
mentation)  and  software  metrics  (e.g.,  size,  structural)  data 
are  available;  during  application,  only  the  latter  are  avail¬ 
able.  During  validation,  we  construct  Boolean  discrimi¬ 
nant  functions  (BDFs)  comprised  of  a  set  of  metrics  and 
their  critical  values  (i.e.,  thresholds)  [1,  2],  We  select  the 
best  BDF  based  on  its  ability  to  achieve  the  maximum 
relative  incremental  quality/cost  ratio.  During  application, 
if  at  least  one  of  the  module’s  metrics  has  a  value  that  ex¬ 
ceeds  its  critical  value,  the  module  is  identified  as  "high 
priority"  (i.e.,  low  quality);  otherwise,  it  is  identified  as 
"low  priority"  (i.e.,  high  quality).  Our  objective  is  to  iden¬ 
tify  and  correct  quality  problems  during  development,  as 
opposed  to  waiting  until  maintenance  when  the  cost  of 
correction  would  be  high.  This  process  addresses  the 
question:  "How  can  I  control  the  quality  of  my  software?" 
Because  BDFs  only  provide  an  accept/reject  decision  on 
module  quality,  during  validation,  we  also  construct 
RCVDs  that  are  used  to  prioritize  the  effort  applied  to 
rejected  modules.  In  other  words,  an  RCVD  measures  the 
degree  to  which  quality  is  low.  This  process  addresses  the 
question.  How  shall  I  prioritize  my  effort  to  achieve  my 
quality  goals? 

A  RCVD  is  a  derived  metric,  based  on  the  normalized 
deviation  between  a  metric's  value  and  its  critical  value.  It 
may  be  based  on  a  single  or  multiple  metrics.  In  our  proc¬ 
ess,  we:  1)  identify  the  critical  values  of  the  metrics  and  2) 
find  the  optimal  BDF  and  RCVD  based  on  their  ability  to 
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satisfy  both  statistical  and  application  criteria.  Statistical 
criteria  refer  to  the  ability  to  correctly  classify  the  software 
(i.e.,  classify  high  quality  software  as  high  quality  and  low 
quality  software  as  low  quality).  Application  criteria  refer 
to  the  ability  to  achieve  a  high  quality/cost  ratio.  This  pro¬ 
cess  addresses  the  questions:  "How  can  I  determine 
whether  my  quality  goals  are  being  met?"  and  "How  much 
will  it  cost  to  achieve  my  quality  goals?" 

RCVD  values  that  exceeded  the  .80  percentile  value 
were  able  to  account  for  two-thirds  of  the  discrepancy 
reports.  To  round  out  our  approach,  we  use  regression 
equations  to  predict  quality  limits.  This  is  desirable  be¬ 
cause,  although  BDFs  and  RCVDs  control  and  predict 
quality  based  on  expected  values,  they  are  not  capable  of 
predicting  the  range  of  quality  values. 

We  show  that  it  is  important  to  perform  a  marginal 
analysis  (i.e.,  identification  of  the  incremental  contribution 
of  each  metric  to  improving  quality)  when  making  a  deci¬ 
sion  about  how  many  metrics  to  include  in  the  BDFs  and 
RCVDs.  If  many  metrics  are  added  to  the  set  at  once,  the 
contribution  of  individual  metrics  is  obscured.  Also,  the 
marginal  analysis  provides  an  effective  rule  for  deciding 
when  to  stop  adding  metrics. 

The  contributions  of  this  research  are  the  following:  1) 
the  Relative  Critical  Value  Deviation  (RCVD)  is  a  new 
metric  for  classifying  and  predicting  software  quality;  2) 
the  RCVDs  in  combination  with  the  BDFs  we  previously 
developed,  allow  the  software  manager  to  both  control 
quality  and  prioritize  the  effort  required  to  achieve  quality 
goals;  3)  BDFs,  RCVDs,  and  regression  equations  are 
integrated  into  a  process  to  assist  the  software  manager  in 
answering  the  questions  posed  in  the  introduction;  and  4) 
the  data  and  most  of  the  calculations  are  implemented  in  a 
spreadsheet  for  easy  transfer  to  practitioners. 

1.1  Related  Research 

Our  models  are  in  the  class  of  models  concerned  with 
the  classification,  control,  and  prediction  of  quality.  Other 
researchers  have  had  similar  objectives  but  different  ap¬ 
proaches.  Porter  and  Selby  used  classification  trees  to  par¬ 
tition  multiple  metric  value  space  so  that  a  sequence  of 
metrics  and  their  critical  values  could  be  identified  that 
were  associated  with  either  high  quality  or  low  quality 
software  [3].  This  technique  is  closely  related  to  our  ap¬ 
proach  of  identifying  a  set  of  metrics  and  their  critical 
values  that  will  satisfy  quality  and  cost  criteria.  However, 
we  use  statistical  analysis  to  make  the  identification. 

Briand  et  al.  used  logistic  regression  to  classify  mod¬ 
ules  as  fault-prone  or  not  fault-prone  as  a  function  of  vari¬ 
ous  object  oriented  metrics  [4].  In  another  example  of 
logistic  regression,  Khoshgofitaar  and  Allen  used  it  to  clas¬ 
sify  modules  as  fault-prone  or  not  fault-prone  as  a  function 
of  faults,  requirements,  performance,  and  documentation 
software  trouble  report  metrics  [5].  While  one  of  our  ob¬ 
jectives  is  similar  —  classify  modules  as  either  high  quality 
or  low  quality  —  we  derive  from  this  binary  classification 


several  predictive  continuous  quality  and  cost  metrics, 
including  the  RCVDs.  These  metrics  are  used  to  predict 
the  quality  of  software  that  will  be  delivered  by  develop¬ 
ment  to  maintenance  and  the  cost  of  achieving  it. 

Khoshgoftaar  et  al.  used  nonparametric  discriminant 
analysis  in  each  iteration  of  a  military  system  project  to 
predict  fault-prone  modules  in  the  next  iteration  [6].  This 
approach  provided  early  indication  of  reliability  and  the 
risk  of  implementing  the  next  iteration.  They  conducted  a 
similar  study  involving  a  telecommunications  application, 
again  using  nonparametric  discriminant  analysis,  to  clas¬ 
sify  modules  as  either  fault-prone  or  not  fault-prone  [7]. 
Our  approach  has  the  same  objective  but  we  produce 
BDFs  and  RCVDs  in  terms  of  the  original  metrics  as  op¬ 
posed  to  using  density  functions  as  discriminators. 

Khoshgoftaar  and  Allen  have  also  developed  models 
for  ranking  modules  for  reliability  improvement  according 
to  their  degree  of  fault-proneness  as  opposed  to  whether 
they  are  fault-prone  or  not  [8].  They  used  Alberg  Dia¬ 
grams  [9]  that  predict  percentage  of  faults  as  a  function  of 
percentage  of  modules  by  ordering  modules  in  decreasing 
order  of  faults  and  noting  the  cumulative  number  of  faults 
corresponding  to  various  percentages  of  modules.  Our 
approach  is  similar  but  we  accomplish  the  same  objective 
by  sorting  the  modules  by  RCVD  and  finding  its  percen¬ 
tile  distribution  and  the  corresponding  drcount  percentile 
distribution,  as  we  explain  later. 

2.  Discriminative  Power  Model 

2.1.  Discriminative  Power  Validation 

Using  our  metrics  validation  methodology  [10,  11], 
and  the  Space  Shuttle  flight  software  metrics  and  discrep¬ 
ancy  reports  (DRs),  we  validate  metrics  with  respect  to  the 
quality  factor  drcount .  This  is  the  number  of  discrepancy 
reports  written  against  a  module.  In  brief,  this  involves 
conducting  statistical  tests  to  determine  whether  there  is  a 
high  degree  of  association  between  drcount  and  candidate 
metrics.  As  shown  in  Figure  1,  we  validate  metrics  on 
Build  1  (1397  modules)  and  apply  them  to  Build  2  (846 
modules)  of  the  Space  Shuttle  flight  software.  Nikora  and 
Munson  argue  for  the  need  of  a  measurement  baseline 
against  which  evolving  systems  may  be  compared  [12]. 
Our  baseline  is  Build  1  in  Figure  1.  The  measurement  re¬ 
sults  from  Build  1  provide  the  data  source  for  controlling 
and  predicting  the  quality  delivered  to  maintenance  and 
for  comparing  predicted  with  actual  quality,  once  the  latter 
is  known.  Next,  we  define  Discriminative  Power. 

2.1.1.  Discriminative  Power 

Given  the  elements  M;j  of  a  matrix  of  n  modules  and 
m  metrics  (i.e.,  nm  metric  values),  the  elements  MCj  of  a 
vector  of  m  metric  critical  values,  the  elements  F*  of  a 
vector  of  n  quality  factor  values,  and  scalar  FC  of  quality 
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factor  critical  value,  M,,  must  be  able  to  discriminate  with 
respect  to  F„  for  a  specified  FC,  as  shown  below: 

Ms  >  Mi  Fi>  FC  and  M„  S  M,-  *-*  F,£  FC  (1) 

for  i=l,2,...,n,  and  with  specified  a,  where  a  is 

the  significance  level  of  various  statistical  tests  that  are 
used  for  estimating  the  degree  to  which  a  set  of  metrics 
can  correctly  classify  software  quality.  In  other  words,  do 
the  indicated  metric  relations  imply  corresponding  quality 
factor  relations  in  (1)?  This  criterion  assesses  whether  MC. 
has  sufficient  Discriminative  Power  to  be  capable  of  dis¬ 
tinguishing  a  set  of  high  quality  modules  from  a  set  of  low 
quality  modules.  If  so,  we  use  the  critical  values  in  Quality 
Control  and  Prediction  described  below.  The  validation 
process  is  illustrated  in  Figure  1,  where  the  critical  values 
MCj  are  produced  during  the  Test  phase  of  Build  1  by  us¬ 
ing  the  metrics  M.  from  the  Design  phase  and  the  quality 
factor  F  (e.g.,  drcount )  available  in  the  Test  phase.  (Dis¬ 
crepancy  Reports  are  written  against  the  software 
throughout  development  but  they  are  not  significantly 
complete  until  the  end  of  the  Test  phase  during  which 
failures  are  observed).  The  desired  quality  level  is  set  by 
the  choice  of  FC.  The  lower  its  value,  the  higher  the 
quality  requirement;  conversely,  the  higher  its  value,  the 
lower  the  requirement.  A  value  of  zero  is  appropriate  for 
safety-critical  systems  like  the  Space  Shuttle. 

2.2.  Relative  Critical  Value  Deviation  (RCVD) 
Metric 

The  RCVD  is  based  on  the  concept  that  the  extent  to 
which  a  metric's  value  deviates  from  its  critical  value, 
normalized  by  the  scale  of  the  metric,  is  an  indicator  of  the 
degree  to  which  the  entity  being  measured  does  not  con¬ 
form  to  a  specified  norm.  For  example,  the  extent  to  which 
body  temperature  exceeds  98.6  degrees  Fahrenheit  is  an 
indicator  of  the  deviation  from  an  established  norm  of 
human  health.  Measurement  involves  using  surrogates:  the 
deviation  in  temperature  above  98.6  degrees  is  a  surrogate 
for  fever.  Similarly,  the  RCVD  is  a  surrogate  for  the  ex¬ 
tent  that  software  quality  deviates  from  acceptable  norms 
(e.g.,  zero  discrepancy  reports).  The  concept  of  the  RCVD 
is  shown  in  Figure  2,  where  the  metric  and  quality  scales 
are  shown,  defined  by  the  maximum  (MX,  and  minimum 
(MNj)  metric  boundaries  and  the  maximum  (FX)  and 
minimum  (FN)  quality  boundaries,  respectively.  The  the¬ 
ory  of  the  RCVD  is  given  by  the  following  relation: 

RCVD  s  = 

(M MC ,)/{MX  j- MN  ,)<*  (f ,  -  FC  )/{FX  -  FN  ) 

This  means  that  the  deviation  of  a  metric  from  its 
critical  value,  normalized  by  metric  length,  is  related  to 
the  degree  of  quality,  as  represented  by  the  normalized 
deviation  of  a  quality  factor  (e.g.,  drcount)  from  its  criti¬ 
cal  values:  increasing  positive  deviations  are  related  to 
decreasing  quality  and  increasing  negative  deviations  are 
related  to  increasing  quality.  It  should  not  be  inferred  that 


the  relationship  is  linear  or  proportional;  in  fact,  it  is  non¬ 
linear.  In  the  idealized  diagram  in  Figure  2,  the  worst 
quality  corresponds  to  MX  and  FX,  the  best  quality  toMN 
and  FN,  and  acceptable  quality  to  MC,  and  FC.  Also,  Fig¬ 
ure  2  does  not  indicate  the  mathematical  form  of  F.  If  FN 
is  equal  to  zero  and  Fc  is  set  equal  to  zero,  which'  is  fre¬ 
quently  the  case,  F .  and  FX  can  be  replaced  by  the  sum  of 
the  quality  factor  across  a  set  of  modules  and  the  total 
quality  factor,  respectively.  This  quantity  is  the  proportion 
of  drcount  computed  across  a  set  of  modules.  An  RCVD 
can  also  be  comprised  of  multiple  metrics  by  computing 
their  mean.  Note  that  although  it  would  not  be  valid  to 
compute  the  mean  of  metrics,  the  mean  of  RCVDs  is  an¬ 
other  story  since  these  are  normalized  dimensionless 
quantities.  We  experimented  with  both  single  and  multiple 
metric  RCVDs,  as  we  explain  later. 

2.3.  Quality  Control  and  Prediction 

Quality  control  is  the  evaluation  of  modules  with  re¬ 
spect  to  predetermined  critical  values  of  metrics.  The  pur¬ 
pose  of  quality  control  is  identify  software  that  does  not 
meet  quality  requirements  early  in  the  development  proc- 
ess  so  corrective  action  can  be  taken  when  the  cost  is  low. 
Quality  control  is  applied  during  the  Design  phase  of 
Build  2  in  Figure  1  to  flag  software  for  detailed  inspection 
that  is  below  quality  limits.  The  validated  BDFs,  com¬ 
prised  of  the  metrics  MV}  and  their  critical  values  MC^  that 
are  obtained  from  Build  1,  are  used  to  either  accept  or 
reject  the  modules  of  Build  2  [1,  2].  At  this  point  during 
the  development  of  Build  2,  only  the  metric  data  and 
MC.  are  available.  The  validated  RCVDs  are  used  to  pri¬ 
oritize  the  attention  and  effort  devoted  to  modules  that  are 
rejected  by  the  BDFs.  Details  are  given  later. 

Quality  predictions  are  used  by  the  developer  to  antici¬ 
pate  rather  than  react  to  quality  problems.  Figure  1  shows 
the  metrics  controlling  and  predicting  the  quality  of  soft¬ 
ware  that  will  be  delivered  to  maintenance  early  in  the 
development  of  Build  2.  Accompanied  by  rigorous  in¬ 
spection  and  test,  this  process  will  result  in  improved 
quality  of  Build  2  and  the  software  that  is  released  to 
maintenance.  Once  all  of  the  quality  factor  data  F.  (e.g., 
drcount)  have  been  collected  for  Build  2,  at  the  end  of  the 
Test  phase  as  shown  in  Figure  1,  the  quality  of  Build  2 
would  be  known.  This,  then,  becomes  the  actual  quality  of 
Build  2  in  the  maintained  software.  Regression  equations 
Fi=f(M* )  are  developed  during  the  Test  phase  of  Build  1 
and  applied  to  predicting  quality  limits  during  the  Design 
Phase  of  Build  2,  as  shown  in  Figure  1 .  This  process  ad¬ 
dresses  the  question:  MHow  can  I  predict  the  quality  of  my 
software?" 

3,  Validation  Methodology 

We  use  a  five  stage  process  to  select  metrics  and  met¬ 
ric  functions  for  quality  control  and  prediction:  1)  com- 
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pute  critical  values  of  the  candidate  metrics;  2)  for  the  set 
of  candidate  metrics  and  critical  values,  find  the  optimal 
BDF  based  on  statistical  and  application  criteria;  3)  apply 
a  stopping  rule  for  adding  metrics;  4)  identify  the  best 
RCVD  for  prioritizing  quality  assurance  effort;  and  5) 
develop  a  regression  equation  that  will  accurately  predict 
quality  limits  (e.g.,  limits  of  drcount ).  Table  1  provides  a 
functional  description  of  each  stage.  The  five  stages  take 
place  during  the  Test  Phase  of  Build  1  of  Figure  1,  once 
all  the  quality  factor  data  F;  (e.g.,  drcount)  are  available. 
The  next  sections  describe  the  analysis  for  each  stage. 

3.1.  Stage  1:  Compute  Critical  Values 

Critical  values  MC  are  computed  based  on  the  Kol- 
mogorov-Smimov  (K-S)  test  [1,  2].  Table  1  shows  the 
metric  definitions,  critical  values  MCj?  and  K-S  distances 
for  six  metrics  of  Build  1.  These  metrics  were  selected 
based  on  their  relatively  high  K-S  distance  compared  to 
other  metrics  that  had  been  collected  on  the  Space  Shuttle . 
The  test  statistic  is  the  maximum  vertical  difference  be¬ 
tween  the  CDFs  of  two  complementary  sets  of  data  (e.g., 
the  CDFs  of  My  for  drcount< FC  and  drcount> FC).  If  the 
difference  is  significant  (i.e.,  a<.005),  the  value  of  1VL 
corresponding  to  maximum  CDF  difference  is  used  for 
MC..  This  relationship  is  expressed  in  equation  (3).  Met¬ 
rics  are  added  to  the  BDF  in  order  of  their  K-S  Distance. 

K-S  (MC,)  = 

ma x$CDF  (m „/( P ,  <  FC  ))]-  [CDF  (mJ(f,  >  FC  ))J  ^ 

3.2.  Stage  2:  Form  a  Set  of  Boolean  Discriminate 
Functions  (BDFs) 

For  each  BDF  identified  in  Stage  1  we  use  Table  2  to 
further  evaluate  the  ability  of  the  functions  to  discriminate 
high  quality  from  low  quality,  from  both  statistical  (e.g., 
misclassification  rates)  and  application  (e.g.,  ability  of  the 
metric  set  to  correctly  classify  low  quality  modules) 
standpoints.  In  Table  2,  MCj  and  FC  classify  modules  into 
one  of  four  categories.  The  left  column  contains  modules 
where  none  of  the  metrics  exceeds  its  critical  value;  this 
condition  is  expressed  with  a  Boolean  AND  function  of 
the  metrics.  This  is  the  ACCEPT  column,  meaning  that 
according  to  the  classification  decision  made  by  the  met¬ 
rics,  these  modules  have  acceptable  quality.  The  right  col¬ 
umn  contains  modules  where  at  least  one  metric  exceeds 
its  critical  value;  this  condition  is  expressed  by  a  Boolean 
OR  function  of  the  metrics.  This  is  the  REJECT  column, 
meaning  that  according  to  the  classification  decision  made 
by  the  metrics,  these  modules  have  unacceptable  quality. 
The  top  row  contains  modules  that  are  high  quality;  these 
modules  have  a  quality  factor  that  does  not  exceed  its 
critical  value  (e.g.,  drcount^ 0).  The  bottom  row  contains 
modules  that  are  low  quality;  these  modules  have  a  quality 
factor  that  exceeds  its  critical  value  (e.g.,  drcount> 0). 


Equation  (4)  gives  the  algorithms  for  making  the  cell 
counts,  using  the  BDFs  of  Fs  and  MSj  that  are  calculated 
over  the  n  modules  for  m  metrics.  This  equation  is  an  im¬ 
plementation  of  the  relation  given  in  (1). 

Ci i =  COUNT  FOR  ((Fi<FC)A(Ml^MCO-.A(Mij<MCj)...A(Min,<MCj) 

C12 =  COUNT  FOR  ((Fi^FC)A((Mi,>MC1)...v(Mij>MCj)...v(Miin>MCm))) 

(4) 

C21 =  COUNT  FOR  ((Fi>FC)A(MiI<MC,)...A(Mij^MCj)...A(Mim<MCm)) 

C22  =  COUNT  FOR  ((Fi  >  FC)  a  ((MiI  >  MC,)...v(Mjj  >  MCj)...v(Mim  >  MC™))) 

for  j=l,...,m,  and  where  COUNT(i)=COUNT(i-l)+l  FOR 
Boolean  expression  true  and  COUNT(i)=COUNT(i-l), 
otherwise;  COUNT(0)=0.  The  counts  (Cn,  C12,  C21,  and 
C22)  correspond  to  the  cells  of  Table  2,  where  row  and 
column  totals  are  also  shown:  n,  n1?  n2,  N„  and  N2. 

In  addition  to  counting  modules  in  Table  2,  we  must 
also  count  the  quality  factor  (e.g.,  drcount)  that  is  incor¬ 
rectly  classified.  This  is  shown  as  Remaining  Factor,  RF, 
in  th z  ACCEPT  column.  This  is  the  quality  factor  count  on 
modules  that  should  have  been  rejected.  Also  shown  is 
Total  Factor,  TF,  the  total  quality  factor  count  on  all  the 
modules  in  the  build.  Table  2  and  subsequent  equations 
show  an  example  validation,  where  the  combination  of 
metrics  from  Table  1  and  their  critical  values  for  Build  1  is 
prologue  size  (P)  with  a  critical  value  of  63,  statements 
(S)  with  a  critical  value  of  27,  and  eta2  (E2)  with  a  critical 
value  of  45.  This  is  the  optimal  BDF.  Later  we  will  ex¬ 
plain  how  we  arrived  at  this  particular  combination  of 
metrics  as  the  optimal  set.  The  results  of  the  following 
calculations  for  the  optimal  BDF  are  shown  in  Table  3. 

3.2.1.  Statistical  Criteria 

We  validate  a  BDF  statistically  by  demonstrating  that 
it  partitions  Table  2  so  that  Cn  and  C22  are  large  relative  to 
C12  and  C2].  If  this  is  the  case,  a  large  number  of  high 
quality  modules  (e.g.,  modules  with  dr  counts  0)  would 
have  My<MCj  and  would  be  correctly  classified  as  high 
quality.  Similarly,  a  large  number  of  low  quality  modules 
(e.g.,  modules  with  drcount> 0)  would  have  M^MCj  and 
would  be  correctly  classified  as  low  quality.  We  evaluate 
partitioning  ability  using  the  misclassification  rates. 

3.2.2.  Misclassification 

We  compute  the  degree  of  misclassification  in  Table 
2  by  noting  that  ideally  Q^n^N,,  C12=0,  C2!=0, 
C22=n2=N2.  The  extent  to  which  this  is  not  the  case  is  esti¬ 
mated  by  Type  1  misclassifications  (i.e.,  the  module  has 
Low  Quality  and  the  metrics  "say"  it  has  High  Quality) 
and  Type  2  misclassifications  (i.e.,  the  module  has  High 
Quality  and  the  metrics  "say"  it  has  Low  Quality).  Thus, 
we  define  the  following  measures  of  misclassification: 
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Proportion  of  Type  1 :  p,  =  C„/n 

For  the  example,  p,  =  (35/1397)*  100  =  2.51%  (5) 

Proportion  of  Type  2 :  p,  =  c„/n  ... 

For  the  example, />,  =  (344/1397)*  100  =  24.62%  ' 

3.2.3.  Application  Criteria 

Because  it  is  the  performance  of  the  metrics  in  the  ap¬ 
plication  context  that  counts,  we  also  validate  metrics  with 
respect  to  the  application  criteria  Quality  and  Inspection , 
which  are  related  to  quality  achieved  and  the  cost  to 
achieve  it,  respectively  [1,2],  During  the  Design  phase  of 
Build  2  in  Figure  1,  we  predict  that  the  quality  computed 
by  equations  (7)— (9)  will  be  delivered  to  maintenance, 
assuming  that  the  modules  rejected  by  the  quality  control 
process  are  inspected  and  tested  and  that  the  problems  that 
are  found  are  corrected.  Furthermore,  we  predict  that  the 
degree  of  inspection  computed  by  equation  (10)  will  be 
required  to  achieve  this  quality.  In  addition  to  controlling 
and  predicting  quality,  equations  (7)~(9)  can  be  used  to 
address  the  question:  "How  can  I  determine  whether  my 
quality  goals  are  being  met?"  For  example,  if  a  quality 
goal  is  <,3%  residual  defects,  the  achievement  of  this  goal 
can  be  measured  by  RFP  -  equation  (9).  Also,  the  degree 
of  ngorous  inspection  -  equation  (10)  can  be  used  to  ad¬ 
dress  the  question:  "How  much  will  it  cost  to  achieve  my 
quality  goals?" 

3.2.4.  Quality 

First,  we  estimate  the  metrics’  ability  to  correctly 
classify  quality,  given  that  the  quality  is  known  to  be  low: 
LQC :  proportion  of  low  quality  (e.g.,  drcount  >  0) 
software  correctly  classified  =  Cn/n 2  (?) 

For  the  example,  LQC=(54 1  /576)*  1 00=93.92%. 

Second,  we  estimate  the  metrics’  ability  to  correctly 
classify  quality,  given  that  the  BDF  has  classified  modules 
as  ACCEPT.  This  is  done  by  summing  quality  factor  in  the 
A  CCEPT  column  in  Table  2  to  produce  Remaining  Factor, 
RF  (e.g.,  remaining  drcount),  given  by  equation  (8). 

RF  =  XFi  for  ((Fi  >  FC)  a  (M„  <  MC ■)...  a 

>  (o) 

(Mi;  ^  MC;)...  A  (M*  <  MCJ) 

for  j-l,...,m.  This  is  the  sum  of  F.  (e.g.,  drcount)  on  mod¬ 
ules  incorrectly  classified  as  high  quality  because,  for 
these  modules,  (F^FQaCM^MCj). 

We  estimate  the  proportion  of  RF  by  equation  (9) 
where  TF  is  the  total  F.  for  the  build. 

RFP  =  RF/TF  (9) 

For  the  example,  from  Table  2  there  are  56  DRs  on  35 
modules  that  are  incorrectly  classified  (i.e.,  RF=56).  The 
total  number  of  DRs  for  the  1397  modules  is  2579  There¬ 
fore,  RFP=(56/2579)*  100=2. 17%. 


3.2.5.  Inspection 

Inspection  is  one  of  the  costs  of  high  quality.  We  are 
interested  in  weighing  inspection  requirements  (i.e.,  per¬ 
cent  of  modules  rejected  and  subjected  to  detailed  inspec¬ 
tion)  against  the  quality  that  is  achieved,  for  various 
BDFs.  We  estimate  inspection  requirements  by  noting  that 
all  modules  in  the  REJECT  column  of  Table  2  must  be 
inspected,  this  is  the  count  C[2+Cn.  Thus,  the  proportion  of 
modules  that  must  be  inspected  is  given  by: 

^  =  (Ci2  +  C22)/h  ^  2  0) 

For  the  example,  *=((344+541 )/l 397)*  100=63.35%  and 
the  percentage  accepted  is  1-1  =  36.65%. 

3.2.6.  Summary  of  Validation  Results 

Table  3  summarizes  the  results  of  the  validation  ex¬ 
ample.  The  properties  of  dominance  and  concordance  are 
evident  in  these  validation  results  and  in  other  data  we 
have  analyzed  from  the  Space  Shuttle.  That  is,  a  point  is 
reached  in  adding  metrics  where  Discriminative  Power  is 
not  increased  because:  1)  the  contribution  of  the  dominant 
metrics  in  correctly  classifying  quality  has  already  taken 
effect  and  2)  additional  metrics  essentially  replicate  the 
classification  results  of  the  dominant  metrics  —  the  con¬ 
cordance  effect.  This  result  is  due  to  the  property  of  the 
BDF  used  as  an  OR  function,  causing  a  module  to  be  re¬ 
jected  if  only  one  of  its  metrics  exceeds  its  critical  value. 

3.3.  Stage  3:  Apply  a  Stopping  Rule  for  Adding 
Metrics 

It  is  important  to  strike  a  balance  between  quality  and 
cost  (i.e.,  between  RFP  and  I).  Thus  we  add  metrics  until 
the  ratio  of  the  relative  change  in  RFP  to  the  relative 
change  in  I  is  maximum,  as  given  by  the  Quality  Inspec¬ 
tion  Ratio  in  equation  (11),  where  i  refers  to  the  previous 
RFP  and  I: 

qir=(1arfp|/rfp,)/(ai/i,)  (11) 

For  the  example,  QIR(P,S-P,S,  E2)=  (( |  .2.17- 
2.95 1  )/2.95)/((63.35-60.13)/60. 13)=4.91 .  Therefore,  we 
stop  adding  metrics  after  eta2  (E2)  has  been  added. 

3.3.1.  Comparison  of  BDF  Validation  with  Applica¬ 
tion  Results 

In  order  to  compare  validation  with  application  re¬ 
sults,  we  first  show  how  BDF  Table  looks  in  the  Design 
phase  of  Build  2  in  Figure  1,  when  only  the  metrics  M 
and  their  critical  values  MC  are  available.  This  is  shown 
m  Table  4,  where  the  "?"  indicates  that  the  quality  factor 
data  F,  are  not  available  when  the  validated  metrics  are 
used  in  the  quality  control  function  of  Build  2.  During  the 
Design  phase  of  Build  2,  modules  are  classified  according 
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to  the  criteria  that  have  been  described.  Whereas  36.65% 
(512/1397)  and  63.35%  (885/1397)  modules  were  ac¬ 
cepted  and  rejected,  respectively,  during  Build  1  (see  Ta¬ 
ble  2),  26.95%  (228/846)  and  73.05  %  (618/846)  modules 
were  accepted  and  rejected,  respectively,  during  Build  2 
(see  Table  4).  The  rejected  modules  would  be  given  prior¬ 
ity  attention  (i.e.,  subjected  to  rigorous  inspection). 

A  comparison  of  the  Validation  (Build  1)  with  the  Ap¬ 
plication  (Build  2)  with  respect  to  statistical  and  applica¬ 
tion  criteria  are  shown  in  Table  5.  To  have  a  basis  for 
comparison  with  the  validation  results,  we  computed  the 
values  shown  in  Table  5  retrospectively  (i.e.,  after  Build  2 
was  far  enough  along  to  be  able  to  collect  all  of  the  quality 
factor  data  at  the  conclusion  of  the  Test  phase).  The  values 
for  Build  2  are  the  actual  quality  delivered  to  maintenance, 
as  shown  during  the  Test  phase  of  Figure  1.  The  results  of 
the  two  builds  are  comparable.  Note  that  the  same  critical 
values  computed  during  Build  1  were  used  on  Build  2. 
This  procedure  is  necessary  because  the  quality  factor  data 
that  is  used  in  the  K-S  test  in  Stage  1  is  not  available  dur¬ 
ing  the  Design  Phase  of  Build  2  in  Figure  1.  This  transfer- 
ability  of  model  parameters  is  key  to  our  process  because 
the  point  of  validation  is  to  apply  its  results  to  other  but 
similar  software  when  the  quality  factor  data  is  not  avail¬ 
able  for  the  latter.  Also,  we  have  found  that  to  apply  this 
approach,  Build  2  does  not  have  to  be  a  direct  descendant 
of  Build  1.  Builds  1  and  2  do  not  have  this  relationship. 

3.4.  Stage  4:  Form  a  Set  of  Relative  Critical 
Value  Metrics  (RCVD) 

Granularity  of  data  is  an  issue  that  does  not  seem  to 
have  been  discussed  much  in  the  literature  but  one  that  we 
have  found  to  be  of  great  importance  in  metrics  analysis. 
By  granularity  we  refer  to  the  level  of  data  (e.g.,  module, 
module  sets,  build)  that  will  yield  useful  results  when  the 
data  are  used  in  a  model.  This  was  an  issue  in  our  research 
to  develop  an  RCVD  suitable  for  use  as  a  second  level 
discriminant  in  controlling  and  predicting  quality.  By  sec¬ 
ond  level  we  mean  that  the  RCVD  comes  into  play  after 
the  optimal  BDF  has  done  its  job  of  either  accepting  or 
rejecting  a  module.  Although  the  BDF  is  very  useful,  it 
does  not  indicate  the  degree  of  quality  (e.g.,  number  of 
DRs)  on  a  rejected  module  or  set  of  rejected  modules.  Our 
original  objective  was  to  provide  discrimination  at  the 
module  level  (i.e.,  rank  the  drcount  in  modules  by 
RCVD).  Due  to  the  large  number  of  modules  with  zero 
DRs  (58.77%  and  50.59%  for  Build  1  and  Build  2,  re¬ 
spectively)  and  the  large  variability  of  the  data,  this  did 
not  prove  feasible.  However,  by  sorting  the  modules  by 
RCVD  and  finding  its  percentile  distribution  and  the  cor¬ 
responding  drcount  percentile  distribution,  we  were  able 
to  identify  key  points  in  the  plots  of  these  distributions. 
We  call  these  points  break  points.  These  are  points  in  the 
percentile  distributions  where  the  slope  of  the  percentile 
curve  starts  to  increase  sharply.  An  example  is  shown  in 


Figure  3,  where  percentile  drcount  is  plotted  against  per¬ 
centile  prologue  size.  A  break  point  occurs  at  .80  percen¬ 
tile  (80%)  on  the  X-axis.  This  corresponds  to  RCVD 
( prologue  size)=0.517.  This  value  corresponds  to  a  Y-axis 
value  of  .35  (35%).  Thus  for  values  of  RCVD  greater  than 
.0517,  we  estimate  that  the  RCVD  would  identify  65%  of 
the  drcount.  Thus  we  see  that  a  difference  of  only  .20  per¬ 
centile  (1.00-.80)  of  the  RCVD  accounts  for  a  difference 
in  .65  percentile  (1.00-.35)  of  the  drcount.  In  order  to  im¬ 
plement  this  process,  we  validate  function  (12)  for  sets  of 
metrics  during  the  Test  Phase  of  Build  2,  in  Figure  1, 
when  the  quality  factor  data  Fj  are  available.  Then  we  ap¬ 
ply  function  (12)  during  the  Design  Phase  of  Build  2, 
when  no  quality  factor  date  is  available  for  Build  2. 

V  (m*  >  MCj) A  RCVD*  (12) 

This  means  that  in  addition  to  rejecting  modules  —  the 
function  performed  by  the  BDF  -  there  is  further  classifi¬ 
cation  performed  by  the  RCVD.  Any  modules  that  evalu¬ 
ate  to  true  in  (12),  would  receive  special  attention  because 
the  likelihood  is  that  they  would  contain  multiple  DRs. 
This  is  illustrated  in  Table  6  where  65.37%  of  the  drcount 
is  identified  by  RCVD  {prologue  size)  in  combination 
with  the  BDF  on  Build  1,  corresponding  to  a  drcount  den¬ 
sity  of  6.08.  This  is  in  contrast  with  a  density  of  .80  on 
modules  where  (12)  does  not  evaluate  to  true  and  2.85 
when  the  BDF  alone  is  used.  Similar  results  are  observed 
for  Build  2  in  Table  6.  These  results  indicate  the  quality 
that  would  be  delivered  to  maintenance  unless  action  is 
taken  in  inspection  and  test  to  correct  the  defects. 

We  experimented  with  using  all  six  metrics  of  Table  1 
in  the  RCVD.  We  used  all  six  in  order  to  have  sufficient 
data  to  make  the  computation  feasible.  RCVD  was  worse 
than  RCVD  {prologue  size),  as  can  be  seen  in  Table  6,  in 
terms  of  both  percentage  of  drcount  classified  and  drcount 
density.  Since  RCVD  {prologue  size)  is  much  easier  to 
compute,  it  was  the  preferred  RCVD  to  apply  to  Build  2, 
as  shown  in  Table  6.  This  result  is  due  to  the  dominance 
and  concordance  properties  of  metrics  mentioned  earlier. 
In  addition,  the  result  is  due  to  the  fact  that  prologue  size 
contains  a  thorough  change  history  comprised  of  the  fol¬ 
lowing  notations  in  the  program  listing:  module;  purpose 
of  the  module;  specification  reference;  change  request; 
discrepancy  report;  release;  release  date;  revision  level; 
programmer;  description  of  change;  listing  of  statements 
affected  by  the  change;  indication  of  whether  a  statement 
is  added,  deleted,  or  changed;  and  program  comments.  We 
use  prologue  size  as  a  predictor  of  drcount  in  the  aggre¬ 
gate  (i.e.,  the  cumulative  quantity  of  entries  in  the  pro¬ 
gram),  not  on  a  one-for-one  basis  of  a  change  possibly 
resulting  in  a  DR. 

A  seemingly  trivial  but  yet  important  aspect  of  this 
stage  of  the  analysis  was  demonstrating  the  usefulness  of 
sorting  data  to  examine  their  distributions  and  the  flexibil¬ 
ity  for  doing  this  provided  by  a  spreadsheet  program. 
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3.5.  Stage  5:  Identify  Quality  Limit  Predictors 

The  final  stage  of  the  analysis  involves  identifying 
regression  equations  for  predicting  the  average  and  limits 
of  quality  (e.g.,  drcount)  of  module  sets,  F.^M^ ),  during 
the  Test  Phase  of  Build  1,  as  shown  in  Figure  1.  This  pro¬ 
cess  is  desirable  because  BDFs  and  RCVDs  are  not  capa¬ 
ble  of  predicting  quality  limits.  During  the  Test  phase  of 
Build  1,  regression  coefficients  are  estimated  and  the  re¬ 
sultant  equation  is  applied,  during  the  Design  Phase  of 
Build  2,  to  predict  the  quality  limits  that  would  be  deliv¬ 
ered  to  maintenance  unless  action  is  taken  to  correct  the 
defects.  As  in  the  case  of  forming  the  RCVDs,  granularity 
of  data  was  an  issue.  Again,  because  of  the  large  number 
of  modules  with  zero  drcount  and  the  large  variability  of 
the  data,  prediction  at  the  individual  module  level  was  not 
feasible.  However,  applying  our  earlier  regression  work 
for  the  Space  Shuttle  [13],  where  we  found  that  if  we  di¬ 
vided  the  data  into  the  appropriate  number  of  frequency 
classes  (i.e.,  modules  sets),  according  to  Sturges'  rule  [14], 
usable  regression  equations  could  be  developed  based  on 
the  averages  computed  for  the  classes.  In  that  work,  we 
only  predicted  average  values.  We  now  extend  the’  ap¬ 
proach  to  include  predicting  quality  limits.  We  experi¬ 
mented  with  various  sets  of  predictor  variables.  The  model 
results  are  shown  in  Table  7.  The  equation  we  selected  is 
the  exponential  function  using  average  statements  (ave  S): 

avedrcount  =  exp(0. 1 137  + 0.0056697  *aveS)  ^ 

This  equation  was  selected  for  application  to  Build  2  for 
the  following  reasons:  1)  lowest  Mean  Square  Error 
(MSE)  in  Table  7;  2)  fair  accuracy  in  predicting  Build  1 
drcount ;  3)  theoretical  consideration  that  the  rate  of 
change  of  drcount  with  module  size  would  vary  with 
module  size  (property  of  exponential  distribution);  and  the 
relative  ease  of  collecting  size  data.  Although  the  F-ratio 
and  R  are  impressive  for  the  linear  function  using  nodes, 
this  equation  has  a  relatively  high  MSE  and  the  collection 
of  nodes  requires  the  use  of  a  metrics  analyzer. 

Prediction  results  are  shown  in  Figures  4  —  7.  The 
figures  show  the  following  for  average  drcount  for  sets  of 
100  modules  (1  —  100,  101  —  200,  etc.):  Figure  4,  actual 
and  predicted  values  for  Build  1;  Figure  5,  actual  and  pre¬ 
dicted  limits  for  Build  1;  Figure  6,  actual  and  predicted 
values  for  Build  2;  and  Figure  7,  actual  and  predicted 
limits  for  Build  2.  Figure  7  shows  that  the  prediction  lim¬ 
its  bracket  the  actual  values  for  Build  2.  This  is  another 
example  of  retrospective  analysis:  once  the  quality  factor 
data  F,  are  available  during  the  Test  Phase  of  Build  2,  Fig¬ 
ure  1,  the  actual  drcount  can  be  compared  with  the  predic¬ 
tions.  In  the  application  of  the  prediction  equation,  the 
software  manager  would  compute  the  average  size  of  sets 
of  modules  and  predict  the  drcount  and  the  limits  of 
drcount  for  each  module  set,  as  shown  in  Figures  6  and  7, 
respectively. 


4.  Summary  and  Conclusions 

We  developed  a  new  metric,  Relative  Critical  Value 
Deviation  (RCVD),  for  classifying  and  predicting  software 
quality.  When  the  granularity  of  data  was  considered,  the 
RCVD  proved  to  be  a  useful  indicator  of  the  degree  to 
which  software  quality  deviates  from  a  specified  norm. 
We  discovered  that  the  major  application  of  the  RCVD 
was  to  prioritize  the  effort  required  to  achieve  quality 
goals.  At  the  outset  we  posed  several  questions  that  the 
software  manager  wants  answered  concerning  software 
quality.  We  provided  an  integrated  set  of  models  based  on 
Boolean  discriminant  functions,  RCVDs,  and  regression 
equations  to  address  these  questions.  We  made  a  thorough 
evaluation  of  two  builds  -  one  was  used  for  validation  and 
the  other  for  application  -  using  a  five-stage  analysis  ap¬ 
proach.  In  the  three  areas  of  our  modeling  effort,  the  pre¬ 
dictions  for  the  application  build  were  close  to  the  actual 
values.  Based  on  these  preliminary  results  and  the  fact  that 
we  have  done  analysis  on  additional  Space  Shuttle  data, 
we  feel  that  the  models,  not  the  specific  numerical  results, 
are  transferable  to  other  organizations,  if  the  models  are 
applied  within  and  not  across  application  domains.  How¬ 
ever,  to  increase  our  confidence  in  the  results,  in  future 
research  we  will  examine  several  additional  builds  of  the 
Space  Shuttle  flight  software.  Finally,  we  found  that  mun¬ 
dane  aspects  of  the  analysis  like  data  sorting  to  discover 
information  about  distributions  of  data  and  the  use  of 
spreadsheet  calculations  significantly  aided  the  analysis. 
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Table  1:  Kolmogorov-Smimov  Distance  for  drcount=0  vs.  drcount> 0 
Validation:  Build  1  (n=1397  modules) 


Metric 
(symbol) 
prologue  size  (P) 
statements  (S) 


loc  (L 


etal  (El) 


nodes  (N) 


Definition 
(counts  per  module) 

change  histoiy  line  count  in  module  listing 
executable  statement  count 


_  unique  operand  count 


non-commented  lines  of  code  count 


unique  operator  count 


node  count  (in  control  graph) 


Distance 


Table  2:  Boolean  Discriminant  Function:  Validation  (Build  1) 


High  Quality 
Fj^FC 
dr  counts  0 


Low  Quality 
Fi>FC 
drcountX) 


A(Mjj^MCj) 

Pi  £  63  ASj  s27AE2j  ^45 


C„=477 


C2i=35 
Type  1 


N,=512 

RF=56 


ACCEPT 


V(Mij>MCj) 

Pi>63VSj>27VE2j>45 


C, 2=344 
Type  2 


REJECT 


n=1397 

TF=2579 


Table  3:  Discriminative^  Power  Validity  Evaluation  (Build  1,  n=1397  modules) 

1  Critical  Values  '  ^  -  ■  1  1  ~ 


Metric  Set 


P,S _ 63  27 _ 

P,  S,  E2 _ _ _ 63  27  45  ~~ 

P ,  S,  E2,  L _  63  27  45 

K-S  Distance  0.592  0.505  0.472  “ 

P:  prologue  size,  S:  statements,  E2:  eta2,  L:  lines  of  code 


p 

63 

S 

E2 

63 

27 

63 

27 

45 

63 

27 

45 

0.592 

0.505 

0.472 

Statistical  Criteria 


LQC  % 
84.90 
92.19 
93.92 
95.14 


Application  Criteria 
RFP  %  QIR 


RFP  % 
6T3 
2.95 
2.17 
1.78 


Table 


High  Quality 
? 


Low  Quality 
? 


4:  Boolean  Discriminant  Function:  Application  (Build  2) 


A(MijiMCj) 


P,s63ASi«27AE2iS45 


N,=228 


ACCEPT 


V(Mjj>MQ) 

Pi>63VSi>27VE2i>45 


N2=61 8 


REJECT 


Table  5:  Comparison  of  Validation  (Build  1, 

n=1397  modules)  with  Application  (Build  2,  n=846  modules) 

Critical  Values 

Statistical  Criteria 

Application  Criteria 

Metric  Set 

P 

S 

E2 

P,  % 

P2% 

LQC  % 

RFP  % 

QIR 

1% 

Validation  P,  S,  E2 

63 

27 

45 

2.51 

24.62 

93.92 

2.17 

4.91 

63.35 

Application  P,  S,  E2 

63 

27 

45 

3.07 

26.71 

93.78 

2.69 

9.11 

73.05 

|P:  prologue  size,  S:  statements,  E2:  eta2 

Table  6:  Comparison  of  Relative  Critical  Value  Deviation  (RCVD)  Discriminative  Power  1 

j  Build  1  (Validation) 

Build  2  (Application) 

RCVD  (six  metrics) 

RCVD  {prologue  size ) 

RCVD  {prologue  size) 

.80  Percentile  RCVD 
Value  (Break  Point) 

.1026 

.0517 

.0777 

BDF  A  RCVD 

((P>63)V(S>27)V(E2>45)) 
A(RCVD>.  1 026) 

((P>63)V(S>27)V(E2>45)) 

A(RC  VD>.05 1 7) 

((P>63)V(S>27)V(E2>45)) 

A  (RCVD>.0777) 

dr  count  identified 
(percent) 

1400 

(54.28) 

1686 

(65.37) 

1002 

(62.74) 

modules  with  drcount 
identified  (percent) 

263 

(18.83) 

280 

(20.04) 

173 

(20.45) 

drcount  density 
{drcountfmodxAt) 

5.32 

6.02 

5.79 

drcount  density  for  other 
modules 

1.04 

.80 

.88 

I  BDF 

((P>63)V(S>27)V(E2>45)) 

drcount  density 

2.85  |  2.51 

1.  RCVD  (six  metrics):  mean  of  RCVDs  of  six  metrics  in  Table  1 

2.  drcount  identified:  count  of  DRs  on  modules  rejected  by  BDF  A  RCVD;  percent  of  total  DRs 

3.  modules  with  drcount  identified:  count  of  modules  rejected  by  BDF  A  RCVD;  percent  of  total  modules 

4.  drcount  density:  drcount/ module  count 

5.  drcount  density  for  other  modules:  modules  other  than  those  rejected  by  BDF  A  RCVD 

Table  7:  Regression  Equation  Summary  for  Predictina  a vedrcount  1 

Predictor 

Variables 

Type 

F 

R2 

MSE 

Mean  Residual 

Predicted 
Build  drcount 

Actual  Build 
drcount 

Build  1 

Validation 

aveS 

Exponential 

56.94 

.851 

0.702 

.0000 

2377 

2579 

aveN 

Linear 

283.13 

.966 

1.545 

.0000 

2241 

2579 

aveS,  aveN 

Exponential 

39.84 

.899 

0.754 

.0000 

2404 

2579 

L  Build  2:  Application  i 

aveS 

Exponential 

56.94 

.851 

0.437 

1637 

1597 

|  S:  statements,  N:  nodes,  MSE:  mean  square  error  computed  between  predicted  and  actual  drcount  1 

◄ - 

Development  - 

— ► 

Build  1:  Validation  Build  2:  Application  Maintenance  of  Build  2 


Design 

Test 

Design 

Test 

MCj  _ 

-► 

MCj 

M* 

Fi 

My  - - 

Control  &  Predict  — 

- ► 

RCVDy 

RCVDij^ 

Quality 

Fi=f(Mjj) 

FpfCMij) 

F*:  Known  Quality 

- ► 

My 

:  Metric  j  on  Module  i 

RCVDjj  : 

Relative  Critical  Value  Deviation 

MCj 

:  Metric  j  Critical  Value 

for  Metric  j  on  Module  i 

Fj 

:  Quality  Factor  on  Module  i 

Fj-f(Mij): 

Quality  Limits  Predictor 

Figure  1 .  Measurement  Process 
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drcount  vs.  prologue  size  RCVD  (Build  1) 


Degree  of  Quality  Degradation 


MNj  MQ 

MIN  Metric  Scale 


Increasing  Size  and 
Complexity 

Mi j  MXj 

- ► 

MAX 


Quality  Scale 

FN  FC 

RCVDij  =  (Mij-MCj)/(MXj-MNj) 


- - ► 

F,  FX 

Decreasing  Quality 


Mg  :  Metric  j  on  Module  i  FC  :  Quality  Factor  Critical  Value 

MQ  :  Metric  j  Critical  Value  RCVD* :  Relative  Critical  Value  Deviation 
Fi  :  Quality  Factor  on  Module  i  i  for  Metrics  i  on  Module  i 


Figure  2.  Quality  Thermometer  Figure  3.  drcount  and  prologue  size  RCVD  percentiles 


actual  vs.  predicted  drcount  (100  module  sets)  Build  1 


actual  vs.  predicted  drcount  (100  module  sets)  Build  1 


*  10  12  14 

module  set  number 


Figure  4.  Predicted  vs.  Actual  drcount  (Build  1)  Figure  5.  Predicted  Limits  vs.  Actual  drcount  (Build  1) 


actual  vs.  predicted  drcount  ( 100  module  sets)  Build  2 


module  set  number 

Figure  6.  Predicted  vs.  Actual  drcount  (Build  2) 


actual  vs.  predicted  drcount  (100  module  sets)  Build  2 


(Build  2) 
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Keynote  Talk 


Investigation  of  the  Risk  to  Software  Reliability  of  Requirements  Changes 
The  1999  NASA  Workshop  on  Risk  Management,  Morgantown,  West  Virginia.  October  28-29 
1999, 13  pages. 


Norman  F.  Schneidewind 
BACKGROUND 

While  software  design  and  code  metrics  have  enjoyed  some  success  as  predictors  of  software 
quality  attributes  such  as  reliability  [KH0961,  KH0962,  LAN95,  MUN96.  OHL96],  the 
measurement  field  is  stuck  at  this  level  of  achievement.  If  measurement  is  to  advance  to  a  higher 
level,  we  must  shift  our  attention  to  the  front-end  of  the  development  process,  because  it  is 
during  system  conceptualization  that  errors  in  specifying  requirements  are  inserted  into  the 
process.  A  requirements  change  may  induce  ambiguity  and  uncertainty  in  the  development 
process  that  cause  errors  in  implementing  the  changes.  Subsequently,  these  errors  propagate 
through  later  phases  of  development  and  maintenance.  These  errors  may  result  in  significant 
risks  associated  with  implementing  the  requirements.  For  example,  reliability  risk  (i.e.,  risk  of 
faults  and  failures  induced  by  changes  in  requirements)  may  be  incurred  by  deficiencies  in  the 
process  (e.g.,  lack  of  precision  in  requirements).  Although  requirements  may  be  specified 
correctly  in  terms  of  meeting  user  expectations,  there  could  be  significant  risks  associated  with 
their  implementation.  For  example,  correctly  implementing  user  requirements  could  lead  to 
excessive  system  size  and  complexity  with  adverse  effects"  on  reliability  or  there  could  be  a 
demand  for  project  resources  that  exceeds  the  available  funds,  time,  and  personnel  skills. 
Interestingly,  there  has  been  considerable  discussion  of  project  risk  (e.g.,  the  consequences  of 
cost  overrun  and  schedule  slippage)  in  the  literature  [BOH91  ]  but  not  a  corresponding  attention 
to  reliability  risk. 

Risk  in  the  Webster's  New  Universal  Unabridged  Dictionary  is  defined  as:  "the  chance  of 
injury,  damage,  or  loss  [WEB79].  Some  authors  have  extended  the  dictionary  definition  as 
follows:  "Risk  Exposure=Probability  of  an  Unsatisfactory  Outcome*Loss  if  the  Outcome  is 
Unsatisfactory"  [BOH91],  Such  a  definition  is  frequently  applied  to  the  risks  in  managing 
software  projects  such  as  budget  and  schedule  slippage.  In  contrast,  our  application  of  the 
dictionary  definition  pertains  to  the  risk  of  executing  the  software  of  a  system  where  there  is  the 
chance  of  injury  (e.g.,  crew  injury  or  fatality),  damage  (e.g.,  destruction  of  the  vehicle),  or  loss 
(e.g.,  loss  of  the  mission)  if  a  serious  software  failure  occurs  during  a  mission.  We  use  risk 
factors  to  indicate  the  degree  of  risk  associated  with  such  an  occurrence. 

The  generation  of  requirements  is  not  a  one-time  activity.  Indeed,  changes  to  requirements 
can  occur  during  maintenance.  When  new  software  is  developed  or  existing  software  is  changed 
in  response  to  new  and  changed  requirements,  respectively,  there  is  the  potential  to  incur 
reliability  risks.  Therefore,  in  assessing  the  effects  of  requirements  on  reliability,  we  should  deal 
with  changes  in  requirements  throughout  the  life  cycle. 
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intermediate  relationships  between  requirements  ^lremenfS  and  rehabihty.  there  are  the 
reliability.  These  relationships  may  interact  to  nut  the  Tkmv  ^  Z  betWeen  comPlexity  and 
the  requirements  changes  may  result  in  increases  in  the^ile  ' *  J  °f  software  at  risk  because 
may  adversely  affect  reliability.  and  C0mplexity  of  the  software  that 


OBJECTIVES  AND  EXPECTED  SIGNIFICANCE 


Objectives 


determining  reliability,  combined1  STur^xn^b  °n  CntlCal  ^  ^  requirements  Pla.v  in 
are  motivated  to  investigate  the  following  issues;06^  ^  mtereSt  “  m£tricS  3nd  reIiability>  we 

thttributes  and  rfability?  That  is> « 

software?  gly  reIated  t0  the  occurrence  of  defects  and  failures  in  the 

and  size?  ThaUs^^  and  software  attributes  like  complexity 

size  of  software?  eS  at  are  stronS^>'  related  to  the  complexity  and 

s“:so?rdLc,ors  ?  Th*  *  «> 

execution  (e.g.,  time  to  next  failure,  number  of  failures)?  ^  rehabiIity  in 

between  h,8h  and  ^ 

-  Which  requirements  attributes  pose  the  greatest  risk  to  reliability? 

**£C ‘hat  ^  reS“rCherS  C0Uld  “  **  *e 

reliability,  and  2)  assess  and  predict  reliability  risk?  rfequiremen,s  cha"ges.  complexity,  and 

predict  reliability  risk  as  a  function  of  requirements  changes. 

Significance 

quanthatWel^assess^^id^predii^^e  Effect  ^  a  SOttware  e»g^e=ring  lacks  the  capability  to 
software.  Much  of  the  research^ Id 1 ZT  requirements .change  on  the  reliability  of  the 

code  characteristics  [NIK98]  This  is  satisfactow  f^6  concems  the  measurement  of 

effectiveness  once  ^  ^  -d  P-cess 

limited  to  measuring  code,  they  will  be  deficient  In  thf  f  n  ^  measurement  Plans  that  are 
coverage  (e.g.,  no  requirements  analysis  and  design) md r^*^™™*1**'  IaCk 
measurement  plan  to  be  effective  it  m..ct  t  &  .  ’  nd  start  t0°  late  in  tbe  process.  For  a 
P  effective,  it  must  start  with  requirements  and  continue  through  to 
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chmSsti^ann^^SbilitTit  -^e?uire”lents  characteristics  directly  affect  code 
requirements  are  specified.  ’  imP°  ant  t0  assess  their  imPact  on  reliability  when 


RESEARCH  PLAN 

mappfaVteweeVchanoet  tn  Zt'tlTT™  “  See  whe,her  il  is  ^sible  «o  develop  a 
requirements, 'c  repre^ntr^rnpl^xh^ani^/0^^111^  ^represents 

^uc*e  doTumentadoi^il^ch^^e^nTomplexity.^3^***1^ 

abi^’^rsrsr 1 ;“^nYrrn? and  reiiabi%- we  wii1  be 

order  to  quantify  the  effect  of  a  renn.rp  ’  function  of  requirements  changes  In 

as  the  -WtTrf  USe  Tr  riSk  fac,0rs  ,hat  are  *fi«a 

examples  of  risk  facton fZ"Zn7L  ZZ  % TfI* ‘"S  "  »"*'*  risk-  Vari°- 
analyze  specified  risk  factors  to  see  in  what  wav  if  anv  ,|„T  We.proPose  t0  statistically 
particular,  we  want  to  identify  those  farm™  th  /u  Ere  associatec*  w^h  reliability.  In 

to  risk  factors,  we  can  also  use  the  numbe  nf  ™  '**'*  °n  In  ^ditio n 

Table  4,  Data  Sources  section)  The  mmk  f  r?quiremants  chanSe  requests  on  modules  (see 
considered  as  additional  potential  risk  factors^  ^  ^  °  °CCUrrence  of  these  ^quests  will  be 

Experiments 

r  whe,hr  ar=“  *  «■*»«*  «* 

Examples  of  the  data  that  would  be  used  ate  sho™  infte^l' tlTe’s  Scti™  abl'“y 


Discriminant  Analysis 


versus  the  al  emle  h^mhe s’s  HrA^ZTf  ,  ^ of  reliability 

will  use  categorical  XTSvsis  and  H  1™  3  dis“™htator  of  reliability,  we 

hypothesis  will  ySiS  *°  'eSt  ,he  lWP«tab.  A  similar 

will  use  the  rich  seTof  ,3™=  0FS  SerVe  aS  disc™htators  of  complexity.  We 

have  from  thTs^/sK^^^^  - 

comprised  of  a  linear  or  Boolean  function  of  th  -if  1  develop  a  discriminate  function 
see  whether  it  can  aZZZ  ^JZJZ  jZ  "f  We  wi"  eValua,e  ,his  faction  to 

or  more  failures  and  those  that  did  not.  We  wilM  ^  requirements  changes  that  caused  a  one 
data  analysis  and  discriminant  analysis  for  d  -ffW  on  our  exPerience  applying  categorical 

Boolean  discriminant  functions  that  are  compdsed^oTboth  aTe^of^  S°ftWarCHbaSed  °n  Using 
critical  values  [SCH971],  P  bo  h  a  set  metncs  and  corresponding 
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statistical  categoriS  ‘ win  br^rfomSd131’1'15  |,,ntCraCt  in  some  cases.  a 

category  of  risk  factor  at  a  time  to  sec  thfeffea  "““aS  We  wil1  use  °nI>  one 

ability  to  correctly  classify  modules  that  have  Hie  addm§  an  additional  discriminator  on  the 

deviations  between  specified  and  observed  softwar/SS^tee’thaT TmT  d°°UmentS 
Trend  Analysis 

we  developed  a  generalized  relative  ChmgfMettkTcMrAar'  re'iability  and  process  lability, 
changes  in  reliability  across  releases!  °  "  <CM)  Ihat  represents  trend  information  (e  « 
scales  of  the  measured  quaSife  Ahltlh  FT*  CM  “  »f  £ 

precise  way  of  measuring  and  comparing  hends  parttular  I  f  S"  2  grf?  iS  useful’  h  is  not  a 
and  the  measurements  are  made  at  discrete  points  he, graph  has  peaks  and  valleys 

changes  in  risk  factors,  complexity,  and  reliabdhv  ,  *  lhis  metric  to  measure 

compare  them  to  see  whether  tends  are  7 JZl re'eases  or  builds  of  <h«  software  and 
accompanied  by  increases  in  complexity  a„a  a“P  d  ,(l'e\  mcreases  in  risk  factors  are 

example  of  computing  CM  for  the  reliability  meter^h/h^x/sT^cF^^  The  follo™g  *  an 

'•  N0K  ^  Cha”8e 2  me,rfC  ft°m  ~  -ex,  (i.e„  release!  to  release  j+l). 

fipo'snive3"8"  "  “  'he  deSiraWe  dlreC,i0"  («*•  Failures/KLOC  decrease),  treat  the  change  in 

in"'  F  “e86  "  ”  'he  di-'ip"  (**.  Failures/KLOC  increase,,  treat  the  chance 

b.  ^ft^^^^it^^is^de'ere^e^divide  i/by^the  ralue  ofthe  mehic'inreki^j.^^ 

metric  U  -J^'a  ““  .!f5count  si8n-  This  *  *e  change 

The  average  of  the  CM  for  a  set  of  metricstfn  beTol"  “fH'besa  vaI““  can  also  be  computed. 

An  example  of  calculating  CM  for  MTTF  and  MuSmc  T  “  °Vera1'  Change  me,ric- 
Operational  Increments  (releases)  of  Shuttle  ™ftw  u°C  13  Sh°Wn  ln  TabIe  1  for  various 
software  system  comprised  of  modules  and  conZ?’  7^  “  °Perational  Increment  (OI)  is  a 
mission  functional  requirements.  Figure  1  shows  the  ^  &  °f  bUlIds  t0  meet  ShuTTle 
across  the  releases.  g  *  S  the  correspondmg  plot  of  Failures/KLOC 
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Tab 

Ic  1 .  Example  Computations  ot  Change  Metric  (CM) 

Opei  atiuiial 
Increment 

Ml  lb 
(Days) 

Relative 

Change 

Failures/KLOC 

Relative 

Change 

A 

n 

179.7 

0.750 

B 

409.6 

0.562 

0.877 

-0.145 

, _ £ _ , 

406.0 

-0.007 

E695 

-0A83 

D 

T~* 

192.3 

-0.527 

0.984 

0.419 

t 

T 

374.6 

0.487 

0.568 

0.423 

J 

r\  f 

73.6 

-0.805 

(X238 

0.581 

V 

68.8 

-0.068 

0.330 

-0.272 

CM 

-0.060  f 

CM  r 

0.087 

Reliability  Prediction 


reliawttt  we'will  P"™^1  with  ■*«  to  discriminating 

predictions.  An  example  of  this  aonroarh  ic  oh  US^  r-  resu  ts  as  sca^e  factors  on  reliability 
versus  test  time  for  one  of  the  Shuttle  OTs  Th°W^  T  F1Sure  ~ ’  which  is  a  plot  of  failure  rate 
period  of  instability  (i  e  ifcreasino  rafe  P  f,°WS  8  decreasinS  trend,  after  an  initial 

Figure  2  shows  that  stability  is  achieved  (i  e  earn  how  to  maintain  new  software), 

increasing  test  time)  There  are  two  tvne<?  nf  ^  as-mPtotica^y  approaches  zero  with 
in  risk  increased^testing 

sz  tz^ 

srr*  XS"'1”"11  rff cte%rrrdw™;d 

lower  risk  would  result  in  higher  reliabiikv  ^°Untf°  test  tlme-  requirements  changes  that  have 
rate,  we  would  predkt  that rfauiremen  s  tlE  n  f  AlK™.ively.  for  a  given  failure 

time  to  achieve  the  specified  reliability  goal,  °Wr  re  lablllty  nsk  wouId  re<Iuire  less  test 

RISK  FACTORS 


" in  makr  ^ 

without  an  accompanying  risk  aslessmem^Durhlrisk  asseTsmen^the ' devd^6 

will  attempt  to  answer  such  questions  as-  “ic  m ie  ,h  ^  ^  “  develoPment  contractor 

software  changes  that  have  been  made  on  the  Shuttle's  ^  theTaS  al^isk  ££ 
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££  tsffs*  iS^ra  da,e  — »  - 

assurance  that  there  are  no  unacceptable  risks  in  mat™6”15  ,Chang6S  0r’  conversely,  providing 

"—on  .0  determine  „t, he/  ^T'  "7^  ,here  has  bae"  » 

less  reliable  and  maintainable  than  low  risk  factor  ™ftP  ’  hlTgh  faCt0r  software  was  really 
predicting  the  reliability  and  maintainability  of  the  uaddltl0n’  there  is  no  model  for 

research  will  address  both  of  these  issues.  '  f  *  Software’  lf  the  change  is  implemented.  Our 

etc  Trisk  completeness,  consistency,  correctness, 

quantify.  Although  some  of  the  following  risk  factorial  C°?Cepts'  the?  are  difficult  to 

are  a  number  of  quantitative  factors  and  manv  of  the  f  ?  ^7  fahtatlve  values  assigned,  there 
the  software  (i.e.,  reliability),  ^ 1  Wlth  the  “n  behavior  of 

categorie^and'h^e^rovided^ouf  imeipretahon^f^he'13,  ^  the  faCtors  int0 

In  addition,  we  added  the  risk  factor  wquirementl  ?  ^  .°n  the  factor  1S  designed  to  answer, 

this  one  could  represent  the  highest  reliability  rislTof alTth”*  fChntqu?ls  because  we  feel  that 
misunderstanding  of  the  intent  of  the  requirements.  f  ^  ** ™  lf  3  techni(3ue  Ieads  to 

Shuttle  Flight  Software  Requirements  Change  Risk  Fartnrc 

given  factor.  Ifthe^ amwr  t^question^t  “  *  high'risk  chan§e  with  respect  to  the 

this  is  a  high-risk  change  with  respect  to  the  given^ctor65^111^  ^  M  an0ma,ous  vaIue’ means 

Complexity  Factors 

0  Qualitative  assessment  of  complexity  of  change  (e.g„  very  complex) 

Sls ^change  highly  complex  relative  to  other  software  changes  tha,  have  been  made  on  the 

o  Number  of  modifications  or  iterations  on  the  proposed  change 

(CCB)  be  «  P—  »  -he  Change  Control  Board 

Size  Factors 

o  Number  of  lines  of  code  affected  by  the  change 

OW  many  lines  of  code  must  be  changed  to  implement  the  change? 

o  Size  of  data  and  code  areas  affected  by  the  change 

•  How  many  bytes  of  existing  data  and  code  are  affected  by  the  change? 
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Criticality  of  Change  Factors 

condition^6  °n  3  nominal  or  off-nominal  program  path  (i.e.,  exception 


off  nominal  program  path  affect  the  reliability  of  the  software? 


o  Operational  phases  affected  (e.g.  ascent  nrhit  ^  ^ 

-  Will  a  ,g-’ascent-  orbit,  and  landip 


-  Will  a  change  to  a  criticaTXt"  fT  ’ .lMding) 

reliability  of  the  software?  P  °f  he  miSS10n  (e^’  ascent  and  landing)  affect  the 


Locality  of  Change  Factors 


-  Ss" £ rs 


area  lead  to  non-mamtamable  code? 


o  New  or  existing  code  that  is  affected 


^  aii-CClCU 

-  Will  a  change  to  new  code  (\  e  a  ^ 

code?  '  “  gC  °n  t0p  of  a  chan§e)  lead  to  non-maintainable 


™w'rrtS  WOUld  have  t0  0CCUr  before  thc  code  tot 

would  have  to  of ; sJ's,em  "  hardware  failures 


Requirements  Issues  and  Function  Factors 


(requirements  issues)  °f  requ,rements  affected  by  the  given  requirement  change 

A  4.1 _  . ® 


If  S°’ 


-  WiH^th^^hang^confl^c^wi^h  ^th  ^an^GS  ?requirements  issues) 
operational  scenarios)  ^  requirements  changes  (e.g.,  lead  to  conflictin' 


o  Number  of  principal  software  functions  affected  bv  the  rhana 

r*.  _  &  • 


Performance  Factors 
O  Amount  of  memory  required  to  implement  the  change 
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-  Will  the  change  use  memory  to  the  extent  that  ntu  r 

memory  to  operate  effectively?  h  *  ther  functl0ns  will  be  not  have  sufficient 

•  o  Effect  on  CPU  performance 

CPWu' ““  -ha.  Cher  functions  wi„  not  have  sufficient 
Personnel  Resources  Factors 

o  Number  of  inspections  required  to  approve  the  chan*e 

■  . 

Tools  Factors 

0  7*-“ the  ^ 

nantoe  require  the  development  and  testing  of  new  tools? 

o  Requirements  specifications  techniques  (e  o  flnw  h-* 

diagram!  ^  **ow  diagram,  state  chart.  Dseudn 


-Will  the  requirements  specificatton  method  be  difficult 


state  chart,  pseudo  code,  control 


to  understand  and  translate  into 


DATA  SOURCES 


we  have  access  to  several  sets  of  data  from  the  Space  Shuttle  of  the  following  ,ypes: 

*>“«**»  >983  to  the  present.  An 
Texas).  Snown  ln  Table  2  (data  provided  by  US  Alliance,  Houston 


Failure  Found  On  Davs  from  R^i  ^ 

Operational  Increment  When  Failure  Occurred  Report#^  °ate  ReIease  Module 

75  "°402  2  05.,  9-97  03-05-97  T 

respectively.  An  example  of  a  paffia I  set  rfrisktcf  b>' ,he.  °n  >0/18/95  and  3/5/97 

by  US  Alliance,  Houston,  Texas).  k  f  d  15  shown  in  Table  3  (data  provided 
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Change 

Request 

Number 

107734 


SLOC  Complexity  Criticality 
Changed  Rating  of  of  Change 
Change 


Table  3 
Number  of 
Principal 
Functions 
Affected 
27 


Number  of 
Modifications 
Of  Change 
Request 
7 


Number  of  Number  of  Manpower 
Requirements  Inspections  Required  to 
Issues  Required  Make 

Change 

238  12  209.3  MW 


m^dat^  An  example  of  a  partial  set  of 

'  Prided  by  Prof.  John  Munson,  University  of  Idaho). 


Module 


Operator 

Count 


Operand 

Count 


_ _ Table  4 

Statement  I  Path 
Count  Count 

606  998 


Cycle 

Count 


Discrepancy 
Report  Count 


Change  Request 
Count 


discriminate  between  levels  of  rtiaMto  ai^c(!l!,I?tll^S1S  the  ability  of  risk  factors  to 
2000  system  -  the  latest  £££  ScL TM t™!™*  ”*  **“  ^  Pr°Pulsion  Laboratoty  X- 
the  software  development  team  and  testers  to  estehlkh  Provldes  a  rare  opportunity  to  work  with 
a  project  as  opposed  the  usual  sLaSh“  i~7n “  ^  ^  °f 

to  instrument  the  software  system  for  obtain;™  °  C  in  an  on'8olng  project.  We  plan 
maintenance  process.  "8  measurem«*s  throughout  the  development  and 


long-term  goals 

This  research  is  another  m  the  serio^  of  mn-  r. .  o 

software  reliability  modeling  and  prediction  mef  m.easurement  Pr0Jects  that  has  included 
stability  analysis  [SCH98]  We  have  been  invo^H  ^na  ^S1S’  nsk  anal>'SiS’  and  maintenance 
software  reliability1  model]  f^rnty  ye"  SCH93  °f 

general  in  software  reliability  use  failure  data  nc  th  a  •  CH93,  '■  °Ur  modeIs’  as  1S  the  case  in 

using  a  metric  that  represents’  the  dynamic  behavior  of  the  sofT  3ppr?fch  has  the  advanta§e  of 
available  until  the  test  phase  Predictions  at  this  h  software.  However,  this  data  is  not 

useful  ,o  predict  a,  an  e£.Jr  pha7e  - ^  “  W°U'd  be  much 
error  correction  is  relatively  low  Thus  there^c  .quirements  analysis— when  the  cost  of 

metrics  field  in  using  static  attributes  of  software  ““ 


Integrating  Risk  Analysis  with  Reliability  Prediction 


devXed^toedhr.he  risk' tmfsoft  PrediCti°n  Wi'h  riSk  anaUis-  risk  metrics  we 

failure  goals  [SCH9731  For  examnlp  not  meedn§  remaining  failures  and  time  to  next 

1  ]*  '  6Xample-  W£  haVe  Used  the  Schneidewind  software  reliability  model 
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for  integrating  reliability  and  reliability 
is  described  below. 


risk  predictions  for  the 


Space  Shuttle.  This  integration 


failures  r(h)  and  time  T 'no  be  a"  h^'d ^  f°r  3  PTOjeCt in  terms  °f remaining 
the  criteria  for  achieving  ,hese"fo  'i  1  “tT  ?“*  the  Software  is  Then 

follows:  °  '  °r  3  glven  ,est  t,„  execution  or  elapsed  time,  are  as 


1)  predicted  remaining  failures  r(,,)<rt.where  rjs  a  specified  critical  value ,  and 

2)  predicted  time  to  next  failure  Tf(,t)>tm.  where  ,m  is  mission  duration. 
Remaining  Failures  Risk  Metric 


Then  we  can  formulate  the  normalized  remaining  failures  risk  metric  as  follows: 
(r(tt)-rc)/rc=(r(tt)/rc)- 1  ^ 

and  negative  values  corresponTing^rrCt^r  a,f?nCti°n  of  test  time:  positive,  zero, 

these  regions  correspond  to  critical,  neutral .  and  deniable,  °f  ^ 

Time  to  Next  Failure  Risk  Metric 

Similarly,  we  can  formulate  the  time  to  next  failure  risk  meric  as  follows: 

(tm-Tr(t,))/tn,-l.(TF(tl))/tn, 

and  negative  values  corres^ndmgnoTFf'fon  "t  ft  “  dT  f'T  °f 'ime:  posi"ve-  2er», 
nsk,  these  regions  correspond  to  * 

applioations,  one  of  our 

type  of  model  and  predictions  were  just  des^bS^?f^bfo^T°l“.applicalio,ls-  0ne 
metrics  and  process  stability  models  [SCH971  SCH72  SCMN^  ^  SUlte  ^  °Ur  qUality 
research  emphasis  to  the  prediction  of  reliabilitv  »♦  £.  ’  ,■  8]'  Now  we  want  to  change  our 

process  -  to  the  requirements  analysis  phase  that  h*  poss*ble  tlme  in  tbe  development 
also  like  to  determine  whether  there  exists  a  “stand!  fT-  ^  beCn  unattainable.  We  would 
a  variety  of  applications  to  reliability  prediction  As  theX^OOO  faCt°rS  ^  C°Uld  be  applied  in 
and  complexity  data  become  available  we  will  J,  +X'"°°°  project  matures^  and  reliability 
we  validate  on  the  Shuttle  are  apXbleTo  The  X-lOOn  °  **  ^  faCt0rS  tha< 

risk.  Also,  we  may  discover  additional  risk  factors  on  the  X  ^  prfdlCtmg  re,iability 

to  determine  whether  the  numerical  results  of  reliabilitv Pr°Je  Lastl.y’  we  wiI1  be  able 
the  Shuttle  scale  to  the  X-2000.  y  lassification  and  prediction  obtained  on 
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Summary  and  Conclusions 

We  show  how  software  reliability  predictions  can  increase  confidence  in  the  reliability  of 
safety  critical  software  such  as  the  NASA  Space  Shuttle  Primary  Avionics  Software  System 
(Shuttle  flight  software).  This  objective  was  achieved  using  a  novel  approach  to  integrate 
software  safety  criteria,  risk  analysis,  reliability  prediction,  and  stopping  rules  for  testing. 
This  approach  is  applicable  to  other  safety  critical  software.  We  only  cover  the  safety  of  the 
software  m  a  safety  critical  system.  The  hardware  and  human  operator  components  of  such, 
systems  are  not  explicitly  modeled  nor  are  the  hardware  and  operator  induced  software 
failures.  Our  concern  is  with  reducing  the  risk  of  all  failures  attributed  to  software.  Thus,  our 
use  of  the  word  safety  refers  to  software  safety  and  not  to  system  safety.  By  improving  the 
reliability  of  the  software,  where  the  reliability  measurements  and  predictions  are  directly 
related  to  mission  and  crew  safety,  we  contribute  to  system  safety. 
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Remaining  failures,  maximum  failures,  total  test  time  required  to  attain  a  given fraction  of 
remaining  failures ,  and  time  to  next  failure  are  shown  to  be  useful  reliability  measurements 
and  predictions  for:  1)  providing  confidence  that  the  software  has  achieved  safety  goals;  2) 
rationalizing  how  long  to  test  a  piece  of  software;  and  3)  analyzing  the  risk  of  not  achieving 
remaining  failure  and  time  to  next  failure  goals.  Having  predictions  of  the  extent  that  the 
software  is  not  fault  free  {remaining  failures)  and  whether  it  is  likely  to  survive  a  mission 
{time  to  next  failure)  provide  criteria  for  assessing  the  risk  of  deploying  the  software. 
Furthermore,  fraction  of  remaining  failures  can  be  used  as  both  an  operational  quality  goal  in 
predicting  total  test  time  requirements  and,  conversely,  as  an  indicator  of  operational  quality 
as  a  function  of  total  test  time  expended. 

Software  reliability  models  provide  one  of  several  tools  that  software  managers  of  the 
Shuttle  flight  software  are  using  to  provide  confidence  that  the  software  meets  required  safety 
goals.  Other  tools  are  inspections,  software  reviews,  testing,  change  control  boards,  and 
perhaps  most  important  --  experience  and  judgement. 

1.  Introduction 

We  propose  that  two  categories  of  software  reliability  measurements  (i.e.,  observed  failure 
data  used  for  model  parameter  estimation)  and  predictions  (i.e.,  forecasts  of  future  reliability 
using  the  parameterized  model)  be  used  in  combination  to  assist  in  assuring  the  safety  of  the 
software  in  safety  critical  systems  like  the  Shuttle  flight  software.  The  two  categories  are:  1) 
measurements  and  predictions  that  are  associated  with  residual  software  faults  and  failures, 
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and  2)  measurements  and  predictions  that  are  associated  with  the  ability  of  the  software  to 
survive  a  mission  without  experiencing  a  serious  failure.  In  the  first  category  are:  remaining 
failures,  maximum  failures,  fraction  of  remaining  failures,  and  total  test  time  required  to 
attain  a  given  number  or  fraction  of  remaining  failures .  In  the  second  category  are:  time  to 
next  failure  and  total  test  time  required  to  attain  a  given  time  to  next  failure.  In  addition,  we 
define  the  risk  associated  with  not  attaining  the  required  remaining  failures  and  time  to  next 

failure.  Lastly,  we  derive  a  quantity  from  the  fraction  of  remaining  failures  that  we  call 
operational  quality. 

The  benefits  of  predicting  these  quantities  are:  1)  they  provide  confidence  that  the 
software  has  achieved  safety  goals,  and  2)  they  provide  a  means  of  rationalizing  how  long  to 
test  a  piece  of  software  (stopping  rule).  Having  predictions  of  the  extent  that  the  software  is 
not  fault  free  {remaining failures)  and  its  ability  to  survive  a  mission  {time  to  next, failure)  are 
meaningful  for  assessing  the  risk  of  deploying  safety  critical  software.  In  addition,  with  this 
type  of  information  a  software  manager  can  determine  whether  more  testing  is  warranted  or 
whether  the  software  is  sufficiently  tested  to  allow  its  release  or  unrestricted  use.  These 
predictions,  in  combination  with  other  methods  of  assurance,  such  as  inspections,  defect 
prevention,  project  control  boards,  process  assessment,  and  fault  tracking,  provide  a 
quantitative  basis  for  achieving  safety  and  reliability  goals  [3], 

Risk  m the  Webster's  New  Universal  Unabridged  Dictionary  is  defined  as:  "the  chance  of 
injury;  damage,  or  loss"  [19].  Some  authors  have  extended  the  dictionary  definition  as 


216 


follows:  "Risk  Exposure=Probability  of  an  Unsatisfactory  Outcome*Loss  if  the  Outcome  is 
Unsatisfactory"  [2],  Such  a  definition  is  frequently  applied  to  the  risks  in  managing  software 
projects  such  as  budget  and  schedule  slippage.  In  contrast,  our  application  of  the  dictionary 
definition  pertains  to  the  risk  of  executing  the  software  of  a  safety  critical  system  where  there 
is  the  chance  of  injury  (e.g.,  astronaut  injury  or  fatality),  damage  (e.g.,  destruction  of  the 
Shuttle),  or  loss  (e.g.,  loss  of  the  mission)  if  a  serious  software  failure  occurs  during  a 

mission.  We  have  developed  risk  criterion  metrics  to  quantify  the  degree  of  risk  associated 
with  such  an  occurrence. 

Lockheed-Martin,  the  primary  contractor  on  the  Shuttle  flight  software  project,  is 
experimenting  with  a  promising  algorithm  which  involves  the  use  of  the  Schneidewind 
Software  Reliability  Model  to  compute  a  parameter:  fraction  of  remaining  failures  as  a 
function  of  the  archived  failure  history  during  test  and  operation  [10],  Our  prediction 
methodology  uses  this  parameter  and  other  reliability  quantities  to  provide  bounds  on  total 
test  time ,  remaining  failures,  operational  quality,  and  time  to  next  failure  that  are  necessary  to 
meet  Shuttle  safety  requirements.  We  also  show  that  there  is  a  pronounced  asymptotic 
characteristic  to  the  total  test  time  and  operational  quality  curves  that  indicate  the  possibility 
of  big  gams  in  reliability  as  testing  continues;  eventually  the  gains  become  marginal  as  testing 

continues.  We  conclude  that  the  prediction  methodology  is  feasible  for  the  Shuttle  and  other 
safety  critical  systems. 
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We  only  cover  the  safety  of  the  software  in  a  safety  critical  system.  The  hardware  and 
human  operator  components  of  such  systems  are  not  explicitly  modeled  nor  are  the  hardware 
and  operator  induced  software  failures.  However,  in  practice,  these  hardware-software 
interface  and  human  operator-software  interface  failures  may  be  very  difficult  to  identify  as 
such;  these  failures  may  be  recorded  as  software  failures.  Our  concern  is  with  reducing  the 

risk  of  all  failures  attributed  to  software.  Thus,  our  use  of  the  word  safety  refers  to  software 
safety  and  not  to  system  safety. 

Although  remainingfailures  has  been  discussed  in  general  as  a  type  of  software  reliability 
prediction  [13],  and  various  stopping  rules  for  testing  have  been  proposed,  based  on  costs  of 
testing  and  releasing  software  [4,  5,  8,  17],  failure  intensity  [12],  and  testability  [18],  our 
approach  is  novel  because  we  integrate  software  safety  criteria,  risk  analysis,  reliability 
prediction,  and  a  stopping  rule  for  testing.  For  a  system  like  the  Shuttle,  where  human  lives 
are  at  risk,  we  cannot  use  economic  or  time-to-market  criteria  to  determine  when  to  deploy 
the  software.  Although  failure  intensity  has  proven  useful  for  allocating  test  effort  and 
determining  when  to  stop  testing  in  commercial  systems  [12],  this  criterion  is  not  directly 
related  to  software  safety.  In  a  safety  critical  system,  the  prediction  of  remaining  failures  and 
identification  of  the  faults  which  cause  them  is  more  relevant  to  ensuring  safety  than  the  trend 
of  failure  intensity  overtime.  The  latent  faults  must  be  found  and  removed  through  additional 
testing,  inspection,  or  other  means,  if  the  safety  of  the  mission  is  not  to  be  jeopardized. 
Furthermore,  as  we  will  show,  remainingfailures ,  along  with  time  to  next failure ,  cart  be  used 
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as  risk  criteria.  It  is  not  clear  how  failure  intensity  could  be  a  meaningful  safety  criterion. 

Because  testability  attempts  to  quantify  the  probability  of  failure,  if  the  code  is  faulty  [18], 
this  criterion  has  a  relationship  with  reliability  if  we  know  that  the  code  is  faulty.  However  in 
the  Shuttle  and  other  safety  critical  software,  our  purpose  is  to  predict  whether  the  code  is 
faulty.  For  safety  critical  software,  we  must  use  reliability  measurements  and  predictions  to 
assess  whether  safety  and  mission  goals  are  likely  to  be  achieved. 

We  first  define  two  criteria  for  software  safety.  Then  we  apply  these  criteria  to  risk 
analysis  of  safety  critical  software,  using  the  Shuttle  flight  software  as  an  example.  Next,  we 
define  and  provide  brief  derivations  for  a  variety  of  prediction  equations  that  are  used  in 
reliability  prediction  and  risk  analysis;  included  is  the  relationship  between  time  to  next 
failure  and  reduction  in  remaining  failures.  This  is  followed  by  an  explanation  of  the 
principal  of  optimal  selection  of  failure  data  that  involves  selecting  only  the  most  relevant  set 
of  failure  data  for  reliability  prediction,  with  the  result  of  producing  more  accurate 
predictions  than  would  be  the  case  if  the  entire  set  of  data  were  used.  Then  we  show  how  the 
prediction  equations  can  be  used  to  integrate  testing  with  reliability  and  quality.  An  example 
is  shown  of  how  the  risk  analysis  and  reliability  predictions  can  be  used  to  make  decisions 
about  whether  the  software  is  safe  to  deploy.  Lastly  we  show  validation  results  for  a  variety 
of  predictions. 

Acronyms 

OIA  :  Shuttle  operational  increment  A 
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OIB :  Shuttle  operational  increment  B 
OIC :  Shuttle  operational  increment  C 
OID:  Shuttle  operational  increment  D 
Assumptions  [1]: 

1.  Faults  that  cause  failures  are  removed. 

2.  As  more  failures  occur  and  more  faults  are  corrected,  remaining  failures  will  be  reduced. 

3.  The  remaining  failures  are  zero"  for  those  OI's  that  were  executed  for  extremely  long 
times  (years)  with  no  additional  failure  reports;  correspondingly,  for  these  OI's,  maximum 
failures  equals  total  observed  failures. 

4.  The  number  of  failures  detected  in  one  interval  is  independent  of  the  failure  count  in 
another. 

5.  Only  new"  failures  are  counted  (i.e.,  failures  that  are  repeated  as  a  consequence  of  not 
correcting  a  fault  are  not  counted). 

orrecting  a  fault  are  not  counted). 

Definitions 

o  Interval :  an  integer  time  unit  t  of  constant  length  defined  by  t-l<t<t+l,  where  t>0;  failures 
are  counted  in  intervals  (e.g.,  one  failure  occurred  in  interval  4)  [1,  7], 
o  Number  of  Intervals:  the  number  of  contiguous  integer  time  units  t  of  constant  length 

represented  by  a  positive  real  number  (e.g.,  the  predicted  time  to  next  failure  is  3.87 
intervals). 
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o  Operational  Increment  (OI):  a  software  system  comprised  of  modules  and  configured  from 
a  series  of  builds  to  meet  Shuttle  mission  functional  requirements, 
o  Time:  Continuous  CPU  execution  time  over  an  interval  range. 

Severity  Codes : 

1 .  Severe  Vehicle  or  Crew  Performance  Implications. 

2.  Affects  Ability  to  Complete  Mission  (Not  a  safety  issue). 

3.  Workaround  Available,  Minimal  Effect  on  Procedures. 

4.  Insignificant  (Paperwork,  etc.). 

5.  Not  Visible  to  User. 

Norn  enclature 

o  Predicted  at  time  t:  a  prediction  made  in  the  interval  t. 
o  Safety:  software  safety;  not  system  safety. 

Notation 

failure  rate  at  the  beginning  of  interval  s 

negative  of  derivative  of  failure  rate  divided  by  failure  rate  (i.e.,  relative  failure 
rate) 

predicted  failure  count  in  the  range  [l,i];  used  in  computing  MSEr 

observed  failure  count  during  interval  j  since  interval  i;  used  in  computing 
MSEt 

predicted  failure  count  in  the  range  [1,  t] 


P 

F(i) 

Fij 

F(t) 
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given  number  of  failures  to  occur  after  interval  t;  used  in  predicting  TF(t) 
F(t,,t2)  predicted  failure  count  in  the  range  [t1?t?] 

F(°°)  predicted  failure  count  in  the  range  [  1  ,<*>];  maximum  failures  over  the  life  of  the 


j 

J 

MSEf 

MSEr 

MSEX 

P(t) 

Q(t) 


r(t) 


r(tt) 


software 
current  interval 
next  interval  j>i  where  Fjj>0 
maximum  j^t  where  Fjj>0. 

mean  square  error  criterion  for  selecting  s  for  failure  count  predictions 

mean  square  error  criterion  for  selecting  s  for  remaining  failure  predictions 

mean  square  error  criterion  for  selecting  s  for  time  to  next  failure  predictions 
fraction  of  remaining  failures  predicted  at  time  t 

operational  quality  predicted  at  time  t;  the  complement  of  p(t);  the  degree  to 

which  software  is  free  of  remaining  faults  (failures) 

critical  value  of  remaining  failures;  used  in  computing  RCM  r(tt) 

remaining  failures  predicted  at  time  t 

remaining  failures  predicted  at  total  test  time  tt 


Ar(TF,t)  reduction  in  remaining  failures  that  would  be  achieved  if  the  software  were 
executed  for  a  time  TF,  predicted  at  time  t 
RCM  r(tt)  risk  criterion  metric  for  remaining  failures  at  total  test  time  tt 
RCM  TF(tt)  risk  criterion  metric  for  time  to  next  failure  at  total  test  time  tt 
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s  starting  interval  for  using  observed  failure  data  in  parameter  estimation 

s  optimal  starting  interval  for  using  observed  failure  data,  as  determined  by  MSE 

criterion 

t  cumulative  time  in  the  range  [1  ,t];  last  interval  of  observed  failure  data;  current 

interval 

tm  mission  duration  (end  time-start  time);  used  in  computing  RCM  TF(tt) 

tt  total  test  time  (observed  or  predicted) 

TF(t)  time  to  next  failure(s)  predicted  at  time  t 

TF(tt)  time  to  next  failure  predicted  at  total  test  time  tt 

TF(Ar,t)  time  to  next  N  failures  that  would  be  achieved  if  remaining  failures  were 
reduced  by  Ar,  predicted  at  time  t 

Tjj  time  since  interval  i  to  observe  number  of  failures  Fjj  during  interval  j;  used  in 

computing  MSET 

Xj  observed  failure  count  in  the  range  [l,i] 

Xs_i  observed  failure  count  in  the  range  [l,s-l] 

XS;t  observed  failure  count  in  the  range  [s,t] 

XS;t]  observed  failure  count  in  the  range  [s,ti] 

Xt  observed  failure  count  in  the  range  [  1  ,t] 

Xti  observed  failure  count  in  the  range  [l,tj] 


2.  Criteria  for  Safety 


If  we  define  our  safety  goal  as  the  reduction  of  failures  that  would  cause  loss  of  life,  loss 
of  mission,  or  abort  of  mission  to  an  acceptable  level  of  risk  [11],  then  for  software  to  be 

ready  to  deploy,  after  having  been  tested  for  total  time  t„  we  must  satisfy  the  following 
criteria: 


1)  predicted  remaining  failures  r(tt)<rc, 
where  rc  is  a  specified  critical  value  ,  and 

2)  predicted  time  to  next  failure  TF(tt)>tm, 
where  tm  is  mission  duration. 


For  systems  that  are  tested  and  operated  continuously  like  the  Shuttle ,  tt,  TF(tt),  and  tm  are 
measured  in  execution  time.  Note  that,  as  with  any  methodology  for  assuring  software  safety, 
we  cannot  guarantee  safety.  Rather,  with  these  criteria,  we  seek  to  reduce  the  risk  of 
deploying  the  software  to  an  acceptable  level. 


Using  assumption  1  that  the  faults  that  cause  failures  are  removed  (this  is  the  case  for  the 
Shuttle ),  criterion  1  specifies  that  the  residual  failures  and  faults  must  be  reduced  to  a  level 
where  the  risk  of  operating  the  software  is  acceptable.  As  a  practical  matter,  we  suggest  rc=l . 
That  is,  the  goal  would  be  to  reduce  the  expected  remaining  failures  to  less  than  one  before 
deploying  the  software.  The  reason  for  this  choice  is  that  one  or  more  remaining  failures 
would  constitute  unacceptable  risk  for  safety  critical  systems.  This  is  the  threshold  used  by 
the  Shuttle  software  managers.  One  way  to  specify  rc  is  by  failure  severity  level  (e.g.,  severity 
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level  1  for  life  threatening  failures).  Another  way,  which  imposes  a  more  demanding  safety 
requirement,  is  to  specify  that  rc  represents  all  severity  levels.  For  example,  r(tt)<l  would 
mean  that  r(tt)  must  be  less  than  one  failure,  independent  of  severity  level. 

If  we  predict  r(tt)>rc,  we  would  continue  to  test  for  a  total  time  tt'>tt  that  is  predicted  to 
achieve  r(tt')<rc,  using  assumption  2  that  we  will  experience  more  failures  and  correct  more 
faults  so  that  the  remaining  failures  will  be  reduced  by  the  quantity  r(tt)-r(tt').  If  the  developer 
does  not  have  the  resources  to  satisfy  the  criterion  or  is  unable  to  satisfy  the  criterion  through 
additional  testing,  the  risk  of  deploying  the  software  prematurely  should  be  assessed  (see  the 
next  section).  We  know  from  Dijkstra's  dictum  that  we  camiot  demonstrate  the  absence  of 
faults  [6];  however  we  can  reduce  the  risk  of  failures  occurring  to  an  acceptable  level,  as 
represented  by  rc.  This  scenario  is  shown  in  Figure  1.  In  case  A  we  predict  r(tt)<rc  and  the 
mission  begins  at  tt.  In  case  B  we  predict  r(tt)>rc  and  postpone  the  mission  until  we  test  for 
total  time  tt'  and  predict  r(tt')<r.  In  both  cases  criterion  2)  must  also  be  satisfied  for  the 
mission  to  begin. 

2.2  Time  to  Next  Failure  Criterion 

Criterion  2  specifies  that  the  software  must  survive  for  a  time  greater  than  the  duration  of 
the  mission.  If  we  predict  Tp(tt)<tm,  we  would  continue  to  test  for  a  total  time  tt">tt  that  is 
predicted  to  achieve  TF(tt")>tm,  using  assumption  2  that  we  will  experience  more  failures  and 
correct  more  faults  so  that  the  time  to  next  failure  will  be  increased  by  the  quantity  TpO/')- 
Tp(tt).  Again,  if  it  is  infeasible  for  the  developer  to  satisfy  the  criterion  for  lack  of  resources  or 
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failure  to  achieve  test  objectives,  the  risk  of  deploying  the  software  prematurely  should  be 
assessed  (see  the  next  section).  This  scenario  is  shown  in  Figure  2.  In  case  A  we  predict 
TF(tt)>tm  and  the  mission  begins  at  tt.  In  case  B  we  predict  TF(tt)<tm  and  postpone  the  mission 
until  we  test  for  total  time  tt"  and  predict  TF(tt")>tm.  In  both  cases  criterion  1)  must  also  be 
satisfied  for  the  mission  to  begin.  If  neither  criterion  is  satisfied,  we  test  for  a  time  which  is 
the  greater  of  tt'  or  t,". 

3.  Risk  Assessment 

The  amount  of  total  test  time  tt  can  be  considered  a  measure  of  the  degree  to  which 
software  reliability  goals  have  been  achieved.  This  is  particularly  the  case  for  systems  like  the 
Shuttle  where  the  software  is  subjected  to  continuous  and  rigorous  testing  for  several  years  in 
multiple  facilities,  using  a  variety  of  operational  and  training  scenarios  (e.g.,  by  Lockheed- 
Martin  in  Houston,  by  NASA  in  Houston  for  astronaut  training,  and  by  NASA  at  Cape 
Kennedy).  If  we  view  tt  as  an  input  to  a  risk  reduction  process,  and  r(t,)  and  TF(tt)  as  the 
outputs,  we  can  portray  the  process  as  shown  in  Figure  3,  where  rc  and  tm  are  shown  as  "risk 
criteria  levels  of  safety  that  control  the  process.  While  we  recognize  that  total  test  time  is  not 
the  only  consideration  in  developing  test  strategies  and  that  there  are  other  important  factors, 
like  the  consequences  for  reliability  and  cost,  in  selecting  test  cases  [20],  nevertheless,  for  the 

foregoing  reasons,  total  test  time  has  been  found  to  be  strongly  positively  correlated  with 
reliability  growth  for  the  Shuttle  [15], 

3.1  Remaining  Failures 
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We  can  formulate  the  mean  value  of  the  risk  criterion  metric  (RCM)  for  criterion  1  as 
follows: 

RCM  r(tt)=  (r(tt)-rc)/rc=(r(tt)/rc)-l 

We  plot  equation  (3)  in  Figure  4  as  a  function  off  for  rc=l,  where  positive,  zero ,  and 
negative  values  correspond  to  r(tt)>rc,  r(tt)=rc,  and  r(tt)<rc,  respectively.  In  Figure  4,  these 
values  correspond  to  the  following  regions:  UNSAFE  (i.e.,  above  the  X-axis  predicted 
remaining  failures  are  greater  than  the  "safe"  value);  NEUTRAL  (i.e.,  on  the  X-axis  predicted 
remaining  failures  equal  to  the  "safe"  value);  and  SAFE  (i.e.,  below  the  X-axis  predicted 
remaining  failures  are  less  than  the  "safe"  value). 

This  graph  is  for  the  Shuttle  operational  increment  OID.  In  this  example  we  see  that  at 
approximately  tt=57  the  risk  transitions  from  the  UNSAFE  region  to  the  SAFE  region. 

3.2  Time  to  Next  Failure 

Similarly,  we  can  formulate  the  mean  value  of  the  risk  criterion  metric  (RCM)  for 
criterion  2  as  follows: 

RCM  TF(tt)=(tm-TF(tt))/tm=l-(TF(tf))/tm  (4) 

We  plot  equation  (4)  in  Figure  5  as  a  function  off  for  tm=8  days  (a  typical  mission  duration 
time  for  this  01),  where  positive,  zero,  and  negative  risk  corresponds  to  TF(f)<tm,  TF(f)=tm, 
and  TF(f)>tm,  respectively.  In  Figure  5,  these  values  correspond  to  the  following  regions: 
UNSAFE  (i.e.,  above  the  X-axis  predicted  time  to  next  failure  is  less  than  the  "safe"  value); 
NEUTRAL  (i.e.,  on  the  X-axis  predicted  time  to  next failure  is  equal  to  the  "safe"  value);  and 
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SAFE  (le"  below  the  X'axis  predicted  time  to  next  failure  is  greater  than  the  "safe"  value). 
Tins  graph  is  for  the  Shuttle  operational  increment  OIC  In  this  example  we  see  that  at 

all  values  of  tt 
the  RCM  is  in 
the  SAFE 
region. 


4.  Approach  to  Prediction 

In  order  to  support  our  safety  goal  and  to  assess  the  risk  of  deploying  the  software,  we 
make  various  reliability  and  quality  predictions.  In  addition,  we  use  these  predictions  to 
perform  tradeoff  analysis  between  reliability  and  total  test  time.  Thus,  our  approach  is  to  use  a 
software  reliability  model  to  predict  the  following:  1 )  maximum  failures ,  remaining  failures, 
and  operational  quality  (as  defined  in  the  next  section);  2)  time  to  next  failure  (beyond  the 
last  observed  failure);  3)  total  test  time  necessary  to  achieve  required  levels  of  remaining 
failures  (fault)  level,  operational  quality,  and  time  to  next  failure;  and  4)  tradeoffs  between 
increases  in  levels  of  reliability  and  quality  with  increases  in  testing. 

5.  Prediction  Equations 

The  following  prediction  equations  are  based  on  the  Schneidewind  Software  Reliability 
Model  [1,  14,  15,  16],  one  of  the  four  models  recommended  in  the  AIAA  Recommended 
Practice  for  Software  Reliability  [  1  ]  .These  equations  use  assumptions  4-  7  in  the  Introduction. 
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We  derive  these  equations  in  the  next  section. .  We  apply  them  to  analyze  the  reliability  of  the 
Shuttle  flight  software.  All  predictions  are  mean  values. 

Because  the  flight  software  is  run  continuously,  around  the  clock,  in  simulation,  test,  or 
flight,  time  refers  to  continuous  execution  time  and  total  test  time  refers  to  execution  time 
that  is  used  for  testing.  Failure  count  intervals  are  equal  to  30  days  of  continuous  execution 
time.  This  interval  is  long  because  the  Shuttle  software  is  tested  for  several  years;  a  30  day 
interval  length  is  a  convenient  for  recording  failures  for  software  that  is  tested  this  long. 

In  the  following  equations,  the  parameter  a  is  the  failure  rate  at  the  beginning  of  interval 
s,  the  parameter  (3  is  the  negative  of  derivative  of  failure  rate  divided  by  failure  rate  (i.e., 
relative  failure  rate),  t  is  the  last  interval  of  observed  failure  data;  s  is  the  starting  interval  for 
using  observed  failure  data  in  parameter  estimation  that  will  result  in  the  best  estimates  of  a 
and  (3  and  the  most  accurate  predictions  [14];  Xs.j  is  the  observed  failure  count  in  the  range 
[l,s-l];  Xs.t  is  the  observed  failure  count  in  the  range  [s,t];  and  Xt=Xs.,+Xs.t.  These  failure 
count  interval  relationships  are  shown  in  Figure  6;  also  shown  is  total  test  time  tt.  Failures  are 
counted  against  operational  increments  (OIs).  Data  from  four  Shuttle  OI's,  designated  OIA, 
OIB,  OIC,  and  OID  are  used  in  this  analysis. 

5.1  Cumulative  Failures 

When  maximum  likelihood  estimates  are  obtained  for  the  parameters  cl  and  (3,  with  s  as  the 
starting  interval  for  using  observed  failure  data,  we  obtain  the  predicted  failure  count  in  the 
range  [s,t] : 
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(5) 


Fs,t=(a/p)  [  1  -exp(-P((t-s+ 1 )))] 

Furthermore,  if  we  add  X„,  the  observed  failure  count  in  the  range  [l,s-l],  we  obtain 
yYQdicted  failure  count  in  the  range  [1 ,  t]\ 

F(t)=(a/p)[l-exp(-P((t-s+l)))]+Xs.,  (6) 

5.2  Failures  in  an  Interval  Range 

If  we  set  t=t2  and  subtract  Xti=Xs.i+Xs>tl,  the  observed  failure  count  in  the  range  [l,t,], 
from  equation  (6  ),  we  obtain  the  predicted  failure  count  in  the  range  [tItt2\. 
F(t],t2)=(a/p)[l  -exp(-P((t2-s+ 1  )))]-Xs 

5.3  Maximum  Failures 


LS?tl 


(7) 


If  we  let  t-oo  m  equation  (6  ),  we  obtain  the  predicted  failure  count  in  the  range  [fooj 
(i.e.,  maximum  failures  over  the  life  of  the  software): 

F(oo)=a/p+xs., 

5.4  Remaining  Failures 


(8) 


To  obtain  predicted  remaining  failures  r(t)  at  time  t,  we  subtract  Xt=Xs.,+Xs,t  from 
equation  (8): 


(9) 


r(t)=(a/|3)-XS;t=F(oo)-Xt 

r(t)  can  also  be  expressed  as  a  function  of  total  test  time  tt  by  substituting  equation  (5) 
equation  (9)  and  setting  t=tt: 

r(tt)=(a/p)(exp-(3[tr(s- 1 )]) 

5*^  Fraction  of  Remaining  Failures: 


into 


(10) 
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If  we  divide  equation  (9)  by  equation  (8),  we  obtain  fraction  of  remaining  failures 
predicted  at  time  t: 

p(t)=r(t)/F(oo)  (! !) 

5.6  Operational  Quality 

The  operational  quality  of  software  is  the  complement  of  p(t).  It  is  the  degree  to  which 
software  is  free  of  remaining  faults  (failures),  using  assumption  1  that  the  faults  that  cause 
failures  are  removed.  It  is  predicted  at  time  t  as  follows: 

Q(t)— l-p(t)  (12) 

5.7  Total  Test  Time  to  Achieve  Specified  Remaining  Failures 

The  predicted  total  test  time  required  to  achieve  a  specified  number  of remaining  failures 


tt  =  [log[a/(P[r(t,)])]]/p  +  (sl) 

at  tt,  r(tt),  is  obtained  from  equation  (10)  by  solving  for  tt: 

5.8  Time  to  Next  Failure 

By  substituting  t2=t+TF(t)  in  equation  (7),  setting  t,=t,  defining  Ft=F(t,t+TF),and  solving 
for  TF(t),  we  obtain  the  predicted  time  for  the  next  Ft failures  to  occur,  when  the  current  time 


Tf  0)  =  [(log[a  /(ap(xs..  +  Ft))])  /  p](ts+ 1 ) 
for  (a/ (3)  >  (Xs.t  +  Ft) 

is  t : 


The  terms  in  TF(t)  have  the  following  definitions: 
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t:  Current  interval; 

XS;t:  Observed  failure  count  in  the  range  [s,t];  and 

Ft:  Given  number  of  failures  to  occur  after  interval  t. 

We  consider  equations  (5)-(ll)  and  (14)  to  be  predators  of  reliability  that  are  related  to 

safety;  equation  (13)  represents  the  predicted  total  test  time  required  to  achieve  stated  safety 

goals.  If  a  quality  requirement  is  stated  in  terms  of  fraction  of  remaining  failures,  the 

definition  of  Q  as  Operational  Quality,  equation  (12),  is  consistent  with  the  IEEE  definition 

of  quality:  the  degree  to  which  a  system,  component,  or  process  meets  specified  requirements 

[9],  For  example,  if  a  reliability  specification  requires  that  software  is  to  have  no  more  that 

5%  remaining  failures  (i.e„  p=.05,  Q=. 95)  after  testing  for  a  total  of  t,  intervals,  then  a 

predicted  Q  of  .90  would  indicate  the  degree  to  which  the  software  meets  specified 
requirements. 

5.9  Relating  Time  to  Next  N  Failures  and  Remaining  Failures  Predictions 

Although  we  have  shown  the  risk  analysis  and  prediction  equations  for  remainingfailures 
and  time  to  next  failure  separately,  it  would  be  useful  to  combine  these  quantities  in  one 
equation  so  that  we  can  predict  the  effect  on  one  quantity  for  a  given  change  in  the  other.  In 
particular  we  want  to  predict,  at  time  t,  the  time  to  the  next  N failures,  TF(ar,t),  that  would  be 
achieved  if  remainingfailures  were  reduced  by  ar.  We  use  assumption  1  that  N=ar;  that  is, 
faults  that  cause  failures  are  removed.  When  N=1 ,  we  have  the  familiar  time  to  next  failure. 
When  N>I,  T„(ar,t)  is  interpreted  as  cumulative  execution  time  for  the  N  failures  to  occur. 
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Conversely,  we  want  to  predict,  at  time  t,  the  reduction  in  remaining  failures,  Ar(TF,t),  that 
would  be  achieved  if  the  software  were  executed  for  a  time  TF.  This  relationship  is  derived  by 
using  equation  (10)  and  setting  Ar=r(t,)-r(tt),  tt=t,+At,  and  t,=t,  and  solving  for  At=TF(Ar,t): 
TF(Ar,t)=(- 1  /p)  [log[  1  -((P  Ar/ a)(exp(p(t-s+ 1 ))))]]  (15) 

for  ((pAr/a)(exp(p(t-s+l))))<l. 

Equation  (15  )  is  analogous  to  equation  (14).  Also,  Ar  in  equation  (15  )  is  analogous  to  Ft  in 
equation  (14),  if  we  use  assumption  1  that  the  faults  that  cause  the  Ft  failures  are  removed, 
with  a  corresponding  reduction  in  remaining  failures.  The  two  equations  produce  the  same 
result  for  the  same  parameter  values.  Equation  (15  )  has  the  advantage  of  being  a  simpler 
computation  because  it  does  not  require  the  observed  data  vector  XS;t,  which  is  used  in 
equation  (14).  Also,  equation  (15  )  is  convenient  to  use  for  trading  off  time  to  next  N  failures 
against  reduction  in  remaining  failures,  and  the  effort  and  the  total  test  time  implicit  in 
making  the  reductions. 

We  can  invert  equation  ( 1 5 )  to  solve  for  the  reduction  in  remaining  failures  that  would  be 
achieved  by  executing  the  software  for  a  time  TF. 

Ar(TF,t)=(a/p)[exp(-(3(t-s+ 1 ))][  1  -exp(-(3(TF))]  ( 1 6) 

6.  Criterion  for  Optimally  Selecting  Failure  Data 

The  first  step  in  identifying  the  optimal  value  of  s  (s  )  is  to  estimate  the  parameters  a  and  p 
for  each  value  of  s  in  the  range  [  1  ,t]  where  convergence  can  be  obtained  [1,14,16],  Then  the 
Mean  Square  Error  (MSE)  criterion  is  used  to  select  s  ,  the  failure  count  interval  that 
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corresponds  to  the  minimum  MSE  between  predicted  and  actual  failure  counts  (MSEF),  time 
to  next  failure  (MSET),  or  remaining  failures  (MSE,),  depending  on  the  type  of  prediction. 
The  first  two  were  reported  in  [14].  In  this  paper  we  develop  MSE,.  MSE,  is  also  the  criterion 
for  maximum  failures  (F(~))  and  total  test  time  (t,)  because  the  two  are  functionally  related  to 
remaining  failures  (r(t));  see  equations  9  and  13.  We  also  show  MSET  because  it  is  used  in 


predictions  that  involve  time  to  next  failure:  TF(t),  TF(Ar,t),  and  Ar(TF,t).  Once  a,  ft  and  s  are 
estimated  from  observed  counts  of  failures,  the  foregoing  predictions  can  be  made.  The 
reason  MSE  is  used  to  evaluate  which  triple  (a,  ft  s)  is  best  in  the  range  [l,t]  is  that  research 
has  shown  that  because  the  product  and  process  change  over  the  life  of  the  software,  old 
failure  data  (i.e„  s=l)  are  not  as  representative  of  the  current  state  of  the  product  and  process 
as  the  more  recent  failure  data  (i.e„  s>I)  [14).  The  optimal  values  of  s  (s')  that  were  used  in 
the  risk  analysis  and  prediction  examples  are  shown  in  Tables  1-4. 


The  Statistical  Modeling  and  Estimation  of  Reliability  Functions for  Software  (SMERFS) 

[7]  is  used  for  all  predictions  except  t„  TF(Ar,t),  and  Ar(TF,tA  which  are  not  implemented  in 
SMERFS. 


Although  we  can  never  know  whether  additional  failures  may  occur,  nevertheless  we  can 
form  the  difference  between  two  equations  for  r(t):  (9),  which  is  a  function  of  predicted 
maximum  failures  and  the  observed  failures,  and  (10),  which  is  a  function  of  total  test  time, 
and  apply  the  MSE  criterion.  This  yields  the  following  Mean  Square  Error  (MSE,)  criterion 
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j[F(i)Xi]! 

MSE,  =  -ia - 

ts+1 

for  number  of  remaining  failures: 

where  F(i)  is  the  predicted/m/we  count  in  the  range  [l,i]  and  X,  is  the  observed  failure  count 
in  the  range  [l,i]. 

6.2  Mean  Square  Error  Criterion  for  Time  to  Next  Failures 

The  Mean  Square  Error  (MSET)  criterion  for  time  to  next  failure^),  which  was  derived  in 


J-i 

X  CHogta  /{a  -  P(Xs.i  +  Fij))]  /  p(is+ 1 )]  -  ts  ]2 

MSEr  =  — - - - - - 

(J-s) 

for  (a  /  P)  >  (Xs.i  +  Fij) 

[  14],  is  given  by  equation  ( 1 8): 

The  terms  in  MSET  have  the  following  definitions: 

i:  Current  interval; 

j :  Next  interval  j>i  where  Fy>0; 

Xs  i: Observed  failure  count  in  the  range  [s,i]; 

Fy:  Observed  failure  count  during  interval  j  since  interval  i; 

Ty:  Time  since  i  to  observe  number  of  failures  Fy  during  j  (i.e.,  Ty=j-i) 

t:  The  last  interval  of  observed  failure  data;  and 


235 


J:  Maximum  j  <  t  where  Fjj>0. 


7.  Relating  Testing  to  Reliability  and  Quality 
7.1  Predicting  Total  Test  Time  and  Remaining  Failure. 

We  use  equation  (8)  to  predict  maximum  failures  (F(~)=l  1.76)  for  Shuttle  OIA.  Using 
given  values  of p  and  equation  ( 11)  and  setting  tst,„  we  predict  r(tt)  for  each  value  of p.  The 
values  of  r(tt)  are  the  predictions  of  remaining  failures  after  the  01  has  been  executed  for  total 
test  time  t„  Then  we  use  the  values  of  r(t,)  and  equation  (1 3)  to  predict  corresponding  values 
oft,.  The  results  are  shown  in  Figure  7,  where  r(t,)  and  t,  are  plotted  againstp  for  OIA.  Note 
that  required  total  test  time  t,  rises  very  rapidly  at  small  values  of  p  and  r(t,).  Also  note  that  the 

maximum  value  ofp  on  the  plot  corresponds  to  t,=18  and  that  smaller  values  correspond  to 
future  values  of  tt  (i.e.,  tt>18). 

7.2  Predicting  Operational  Qualify 

Equation  (12)  is  a  useful  measure  of  the  operational  quality  of  software  because  it 
measures  the  degree  to  which  faults  have  been  removed  from  the  software  (using  assumption 
1  that  the  faults  that  cause  failures  are  removed),  relative  to  predicted  maximum  failures.  We 
call  this  type  of  quality  operational  (i.e.,  based  on  executing  the  software)  to  distinguish  it 
from  static  quality  (e.g.,  based  on  the  complexity  of  the  software). 

Using  given  values  of p  and  equations  (11)  and  (12)and  setting  t=tt,  we  compute  r(tt)  and 
Q,  respectively.  The  values  of  r(tt)  are  then  used  in  equation  (13)  to  compute  tt.  The 
corresponding  values  of  Q  and  tt  are  plotted  in  Figure  8  as  Operational  Quality >  and  Total  Test 
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Time,  respectively  for  OIA.  We  again  observe  the  asymptotic  nature  of  the  testing  relationship 
in  the  great  amount  of  testing  required  to  achieve  high  levels  of  quality. 

7.3  Predicting  Time  to  Next  Failure 

First,  we  show  the  actual  time  to  next  failure  in  Figure  9  for  OIA  on  the  solid  curve  that 
has  occurred  in  the  execution  time  range  t=[  1 , 1 8],  where  one  failure  occurred  at  t=4, 1 4,  and 
18,  and  two  failures  occurred  at  t=8  and  10.  All  failures  were  Severity  Level  3:  "Workaround 
available;  minimal  effect  on  procedures".  The  way  to  read  the  graph  is  as  follows:  If  we  take  a 
given  failure,  Failure  1,  for  example,  it  occurs  at  t=4;  therefore,  at  t=l  the  time  to  next 
failure =3  (4- 1 );  at  t=2  the  time  to  next  failures!  (4-2);  at  t=4  Failure  1  occurs,  so  the  time  to 
next  failure^  (8-4)  now  refers  to  Failure  2,  etc.  Next,  using  equation  (14),  we  predict  the 
time  to  next  failure  TF(18)  to  be  4  (3.87  rounded)  on  the  dashed  curve.  Based  on  the 

foregoing,  this  prediction  indicates  we  should  continue  testing  if  TF(18)=3.87<tm  (mission 
duration). 

1_A  Predicting  Tradeoffs  of  Time  to  Next  N  Failures  with  Reduced  Remaining  Failures 

By  using  equation  (15  ),  we  can  predict  time  to  next  N failures,  TF(Ar,t),  as  a  function  of 
reduction  in  remaining  failures,  at.  This  is  shown  in  Figure  1 0  for  OIA  ,  where,  for  example, 
with  Ar=l,  we  predict  TF(1,18)=3.87  (i.e.,  a  reduction  in  remaining  failures  of  1  corresponds 
to  achieving  a  time  to  next  failure  of  3.87  intervals  from  the  current  interval  1 8).  Conversely, 
by  using  equation  (1 6 ),  we  predict  reduction  in  remaining  failures,  Ar(TF,t),  as  a  function  of 
time  to  next  failure,  TF.  This  is  shown  in  Figure  11  for  OIA,  where,  for  example,  with 
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Tf-3.87,  we  predict  Ar(3.87,18)  =  l  (i.e.,  executing  OIA  for  a  time  to  next  failure  of  3.87 
intervals  from  the  current  interval  18  corresponds  to  achieving  a  reduction  in  remaining 
failures  of  1).  We  provide  further  elaboration  of  these  graphs  in  the  next  section. 

8.  Making  Safety  Decisions 

In  making  the  decision  about  how  long  to  test,  tt,  we  apply  our  safety  criteria  and  risk 
assessment  approach.  We  use  Table  1  to  illustrate  the  process.  For  tt=l  8  (when  the  last  failure 
occurred  on  OIA),  rc=l,  and  tm=8  days  (.267  intervals),  we  show  remaining  failures,  RCM  for 
remaining  failures,  time  to  next  failure,  RCM  for  time  to  next  failure,  and  operational  quality. 
These  results  indicate  that  safety  criterion  2  is  satisfied  but  not  criterion  1  (i.e.,  UNSAFE  with 
respect  to  remaining  failures)-  also  operational  quality  is  low. 

By  looking  at  Figure  10  and  Table  1,  we  see  that  if  we  reduce  remaining  failures  r(18)  by 
1  from 4.76  to  3.76  (non-integer  values  are  possible  because  the  predictions  are  mean  values), 
the  predicted  time  to  next  failure  that  would  be  achieved  is  TF(18)=3.87  intervals.  These 
predictions  satisfy  criterion  2  (i.e.,  TF(18)=3.87>tm=.267)  but  not  criterion  1  (i.e., 
r(18)=4.76>rc=l).  Note  also  in  Figure  10  and  Table  1  that fraction  of  remaining failures  p=l- 
Q=  40  at  r(  1 8)=4. 76.  Now,  if  we  continue  testing  for  a  total  time  tt=52  intervals,  as  shown  in 
Figure  10  and  Table  1,  and  reduce  remaining  failures  from  4.76  to  .60,  the  predicted  time  to 
next  4. 16 failures  that  would  be  achieved  is  33.94  (34,  rounded)  intervals.  This  corresponds  to 
tt-l  8+34=52  intervals.  That  is,  if  we  test  for  an  additional  34  intervals,  starting  at  interval  1 8, 
we  would  expect  to  experience  4.16  failures.  These  predictions  now  satisfy  criterion  1 
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because  r(52)=  60<rc=l.  Note  also  in  Figure  10  and  Table  1  that  fraction  of  remaining 
failures  p=l  -Q=.05  at  r(52)=. 60.  Using  the  converse  of  the  relationship  in  Figure  1 0,  provides 
another  perspective,  as  shown  in  Figure  11,  where  we  see  that  if  we  continue  to  test -for  an 
additional  TF=34  intervals,  starting  at  interval  18,  the  predicted  reduction  in  remaining 
failures  that  would  be  achieved  is  4.16  or  r(52)=.60. 

Lastly,  Figure  12  shows  the  Launch  Decision,  relevant  to  the  Shuttle,  (or,  generically,  the 
Deployment  Decision ),  where  remaining  failures  are  plotted  against  total  test  time  for  01  A. 
With  these  results  in  hand,  the  software  manager  can  decide  whether  to  deploy  the  software 
depending  on  factors  such  as  predicted  remaining  failures,  as  shown  in  Figure  12,  along  with 
considering  other  factors  such  as  the  trend  in  reported  faults  over  time,  inspection  results,  etc.. 
If  testing  were  to  continue  until  tt=52,  the  predictions  in  Figure  12  and  Table  1  would  be 
obtained.  These  results  show  that  criterion  1  is  now  satisfied  (i.e.,  SAFE)  and  operational 
quality  is  high.  We  also  see  from  Figure  12  that  at  this  value  of  tt,  further  increases  in  tt 
would  not  result  in  a  significant  increase  in  reliability  and  safety.  Also  note  that  at  tt=52  it  is 

not  feasible  to  make  a  prediction  of  TF(52)  because  the  predicted  remaining  failures  is  less 
than  one. 


239 


Table  1 


Safety  Criteria  Assessment 
OIA 


?•  Summary  of  Predictions  and  Validation 

9.1  Predictions 


Table  2  shows  a  summary  of  remaining  and  maximum  failure  predictions  compared  with 
actual  failure  data,  where  available,  for  OIA,  OIB,  OIC,  and  OID.  Because  we  do  not  know 
the  actual  remaining  and  maximum  failures,  we  use  assumption  3:  remaining  failures  are 
zero  for  those  OI's  (B,  C,  and  D)  that  were  executed  for  extremely  long  times  (years)  with 
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no  additional  failure  reports;  correspondingly,  for  these  OI's,  we  use  assumption  3  that 
maximum  failures  equals  total  observed  failures. 


Table  2 

Predicted  Remaining  and  Maximum  Failures  versus  Actuals 


a 

P 

r(tt) 

Actual  r 

F(°°) 

Actual  F 

OIA 

18 

9 

.534 

.061 

4.76 

?A 

11.76 

7a 

OIB 

20 

1 

1.69 

.131 

0.95 

1B 

12.95 

13b 

OIC 

20 

7 

1.37 

.126 

1.87 

H  2C 

12.87 

13c 

OID 

7fl  A - TH  . 

18 

_ 1  T _ 

6 

A  T1  • 

.738 

T 

.051 

7.36 

4° 

17.36 

14“ 

30  day  Total  Test  Time  Intervals 

Time  of  last  recorded  failure: 


A.  No  additional  failures  have  been  reported  after  17.17  intervals. 

B.  The  last  recorded  failure  occurred  at  63.67  intervals. 

C.  The  last  recorded  failure  occurred  at  43.80  intervals. 

D.  The  last  recorded  failure  occurred  at  65.03  intervals. 

Table  3  shows  a  summary  of  total  test  time  and  time  to  next  failure  predictions  compared 
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with  actual  execution  time  data,  where  available,  for  OIA,  OIB,  QIC,  and  OID 


Table  3 


Predicted  Total  Test  Time  and  Time  to  Next  Failure  versus  Actuals 


*  Cannot  predict  because  predicted  Remaining  Failures  is  less  than  one. 

Additional  Predictions  for  OID: 


The  following  are  additional  predictions  of  total  test  time  for  OID  that  are  not  listed 
m  Table  3:  tt(r=2)=43.35,  Actual=45. 1 7;  t,(r=3)=35.47,  Actual=23.70. 


Table  4  shows  a  summary  of  the  predictions  of  time  to  next  failure  for  a  given  reduction  in 
remaining  failures  of  1  and  the  predictions  of  reduction  in  remaining faiiures  forg.ven  time 


to  next  failure  compared  with  actual  execution  time  and  failure  data,  where  available,  for  OIA 
OIB,  OIC,  and  OID. 


Table  4 
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Predicted  Tradeoffs  of  Time  to  Next  Failure  with  Reduced  Remaining  Failures 


versus  Actuals 


t 

s* 

a 

P 

TF(Ar=l,t) 

Actual  (TF,t) 

Ar(TF,t) 

Actual 

OIA 

18 

9 

.534 

.061 

3.87 

9 

3.87 

1.00 

? 

OIB 

20 

1 

1.69 

.131 

43.67 

43.67 

.95 

1.0 

OIC 

20 

5 

1.34 

.096 

4.16 

7.63 

7.63 

1.58 

1.0 

OID 

18 

5 

1.61 

.137 

6.35 

6.20 

6.20 

.99 

1.0 

30  day  Total  Test  Time  and  Time  to  Next  Failure  Intervals. 

*  Cannot  predict  because  predicted  Remaining  Failures  is  less  than  one. 

9.2  Validation 


A  total  of  18  predictions  were  made  across  Tables  2,  3,  and  4,  where  there  was  an  actual 
value  to  compare:  three  r(t),  four  F(«>),  four  tt,  two  TF(t),  two  TF( Ar,t),  and  three  Ar(TF,t).  The 
mean  relative  error  (mean  of  (actual-predicted)/actual)  of  prediction  is  22.92%  and  the 
standard  deviation  is  27.61%.  In  making  these  predictions  we  note  both  the  sparsity  of  post¬ 
delivery  failures  and  the  extremely  long  test  times  for  Shuttle  flight  software,  as  summarized 
in  Table  5.  See  the  Appendix  for  a  listing  of  the  failure  data.  Despite  the  fact  that  the 
Schneidewind  Software  Reliability  Model  uses  optimal  selection  of  failure  data,  and  thus  less 
than  the  full  set  of  data,  there  must  be  a  minimum  number  of  failures  to  start  the  parameter 
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estimation  process,  understanding  that  the  model  will  then  select  the  optimal  value  of  s(s‘). 
Thus,  given  the  sparsity  of  the  data,  all  failures  in  Table  5  were  used  in  parameter  estimation, 
regardless  of  their  severity.  Furthermore,  as  described  earlier,  a  more  conservative  risk 
assessment  is  produced  if  all  categories  of  failures  are  included  in  the  analysis. 


Table  5 

Failure  Distribution  by  Severity  Code 


There  are  no  post-deliveiy  Severity  1  or  5  failures  in  the  above  Operational  Increments. 


APPENDIX 


Observed  Failure  Counts 


(Interval 

i  = 

30 

days  execution 

time) 

i  OIA 

OIB 

QIC 

OID 

1 

0 

1 

0 

0 

2 

0 

1 

0 

0 

3 

0 

1 

0 

0 

4 

1 

2 

0 

0 

5 

0 

1 

0 

3 

6 

0 

0 

2 

1 

7 

0 

0 

1 

0 

8 

2 

2 

3 

1 

9 

0 

1 

1 

0 

10  2 

0 

0 

1 

11  0 

2 

0 

1 

12  0 

0 

0 

0 

13  0 

1 

1 

2 

14  1 

0 

1 

0 

15  0 

0 

0 

0 

16  0 

0 

0 

0 

17  0 

0 

1 

0 

18  1 

0 

0 

1 

19 

0 

0 

0 

20 

0 

1 

0 

21 

0 

0 

0 

22 

0 

0 

0 

23 

0 

0 

0 

24 

0 

0 

1 

25 

0 

0 

0 

26 

0 

0 

0 

27 

0 

0 

0 

28 

0 

1 

0 

29 

0 

0 

0 

30 

0 

0 

0 

31-63 

0 

64 

1 

31-43 

0 

44 

1 

31-45 

0 

46 

1 

47-58 

0 

59 

1 

60-65 

0 

66 

1 

Totals : 


7  13  13  14 
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We  develop  a  quality  control  and  prediction  model  for  improving  the  quality  of  software 
delivered  by  development  to  maintenance.  This  model  identifies  modules  that  require  priority 
attention  during  development  and  maintenance  by  using  Boolean  discriminant  functions.  The 
model  also  predicts  during  development  the  quality  that  will  be  delivered  to  maintenance  by 
using  both  point  and  confidence  interval  estimates  of  quality.  We  show  that  it  is  important 
to  perform  a  marginal  analysis  when  making  a  decision  about  how  many  metrics  to  include 
in  a  discriminant  function.  If  many  metrics  are  added  at  once,  the  contribution  of  individual 
metrics  is  obscured.  Also,  the  marginal  analysis  provides  an  effective  rule  for  deciding  when 
to  stop  adding  metrics.  We  also  show  that  certain  metrics  are  dominant  in  their  effects  on 
classifying  quality  and  that  additional  metrics  are  not  needed  to  increase  the  accuracy  of 
classification.  Related  to  this  property  of  dominance  is  the  property  of  concordance .  which  is 
the  degree  to  which  a  set  of  metrics  produces  the  same  result  in  classifying  software  quality. 
A  high  value  of  concordance  implies  that  additional  metrics  will  not  make  a  significant 
contribution  to  accurately  classifying  quality;  hence,  these  metrics  are  redundant.  Data  from 
the  Space  Shuttle  flight  software  are  used  to  illustrate  the  model  process. 


1.  Introduction 

A  key  problem  in  maintenance  is  to  identify  problems  in  the  software  during 
development  before  it  reaches  maintenance.  To  this  end,  we  develop  a  quality  control 
and  prediction  model  that  is  used  to  identify  modules  that  require  priority  attention  dur¬ 
ing  development  and  maintenance.  This  is  accomplished  in  two  activities:  validation 
and  application.  Both  activities  occur  during  software  development.  Validation  is  an 
activity  that  is  required  in  order  to  identify  metrics  that  can  identify  low  quality  soft¬ 
ware  that  requires  corrective,  action.  Application  is  an  activity  during  which  validated 
metrics  are  applied  to  control  and  predict  software  quality.  During  validation ,  we  use 
a  build  of  the  software  that  has  been  developed  as  the  source  of  data  to  compute  a 
discriminant  function  (i.e.,  a  statistical  method  that  is  used  to  classify  software  quality) 
that  we  use  to  retrospectively  classify  and  predict  quality  with  specified  accuracy,  by 
build  and  module.  Using  this  discriminant  function  during  application .  we  classify  and 
predict  the  quality  of  new  software  that  is  being  developed.  We  make  both  point  and 
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confidence  interval  estimates  of  quality.  This  is  the  quality  we  expect  to  experience 
during  maintenance.  experience 

h„JUring  validation>  b°th  quality  factor  (e.g.,  discrepancy  reports  of  deviations 
between  requirements  and  implementation)  and  software  metrics  (e  g.,  size,  structural) 
data  are  available;  during  application,  only  the  latter  are  available,  burin,  raTdZn 
we  construct  Eobean  discriminant  functions  (BDFs)  comprised  of  a  se’t  of  metrics 

AND  , nX  (Le;  t?reSh0ldS)'  A  BDF  is  a  Bool-n  function  consisting  of 

D  and  OR  operators,  module  metric  values,  and  metric  critical  values  that  is  used 

to  dassify  the  quality  of  software.  A  metric  critical  value  is  a  value  in  the  ra nee  of 

he  eTn  "  ’,efrted  ^  inverse  of  **  Kolmogorov-Smirnov  distance  (to 

be  explained)  that  provides  a  threshold  between  two  levels  (e.g.,  high  and  low)  of 

t  e  quality  of  the  software.  We  select  the  best  BDF  based  on  hs  ability  to  achieve 
the  maximum  relative  incremental  quality/cost  ratio.  During  application,  if  at  least 

is  Ln t  fied  l  m  ha$  a  V3lUe  that  £XCeedS  itS  Critical  value’  the  m°dule 

nnfied  as  high  pnonty  (i.e.,  low  quality);  otherwise,  it  is  identified  as  “low 

priority  (i.e.,  high  quality).  Our  objective  is  to  identify  and  correct  quality  problems 
unng  development  so  that  a  high  quality  product  can  be  delivered  to  maintenance  as 
opposed  to  waiting  until  maintenance  when  the  cost  of  correction  would  be  hi*h  ’ 

We  use  nonparametric  statistical  methods  to:  (1)  identify  the  critical  values  of 
the  metnes  and  (2)  find  the  optimal  BDF  based  on  its  ability  to  satisfy  both  statistical 
and  application  criteria.  Statistical  criteria  refer  to  the  ability  to  correctly  classify  the 
software  (i.e.,  classify  high  quality  software  as  high  quality  and  low  quality  software 
as  low  quality).  Application  criteria  refer  to  the  ability  to  achieve  a  high  quality/cost 
ra  io.  A  BDF  compares  a  module’s  metric  value  with  the  metric’s  critical  value 
or  a  set  of  metrics,  m  classifying  the  quality  of  the  software.  The  BDFs  provide 
good  accuracy  (i.e.,  <3%  error)  for  classifying  quality  factors.  These  functions  make 
fewer  mistakes  in  classifying  software  that  is  low  quality  than  is  the  case  when  linear 
vectors  of  metrics  are  used  because  the  critical  values  provide  additional  information 
for  discriminating  quality.  In  addition,  we  develop  an  effective  stopping  rule  for  addin* 
metnes  to  the  BDF  that  is  based  on  quality/cost  considerations 

We  show  that  it  is  important  to  perform  a  marginal  analysis  (i.e.,  identification 
of  the  incremental  contribution  of  each  metric  to  improving  quality)  when  makin*  a 
decision  about  how  many  metrics  to  include  in  the  discriminant  function.  If  many 
metrics  are  added  to  the  set  at  once,  the  contribution  of  individual  metrics  is  obscured 
Also,  the  marginal  analysis  provides  an  effective  rule  for  deciding  when  to  stop  addin* 
metrics  We  also  show  that  certain  metrics  are  dominant  in  their  effects  on  classifying 
quality  for  Space  Shuttle  software  (i.e.,  dominant  metrics  make  fewer  mistakes  in 
classifying  metrics  than  non-dominant  ones)  and  that  additional  metrics  are  not  needed 
to  accurately  classify  quality.  Related  to  the  property  of  dominance  is  the  property  of 
concordance ,  which  is  the  degree  to  which  a  set  of  metnes  produces  the  same  result 
in  c  assi  yin*  so  tware  quality.  A  high  value  of  concordance  implies  that  additional 
metrics  will  not  make  a  significant  contribution  to  accuratelv  classifying  quality  hence 
these  metrics  are  redundant.  ' 


248 


N.  F.  Schneidewind  /  Software  quality  control  and  prediction  model 


The  contributions  of  this  research  are  the  following: 

(1)  both  statistical  and  application  criteria  should  be  used  to  determine  which  metrics 
and  how  many  metrics  should  be  used  to  classify  maintenance  quality; 

(2)  a  marginal  analysis  should  be  performed  on  each  metric  to  determine  whether  its 
addition  will  increase  the  quality/cost  ratio; 

(3)  the  Boolean  discriminant  function  (BDF)  is  a  new  type  of  discriminant  for  classi¬ 
fying  maintenance  quality; 

(4)  our  application  of  the  Kolmogorov-Smimov  (K-S)  distance  is  a  new  way  to  de¬ 
termine  a  metric’s  critical  value;  and 

(5)  we  have  developed  a  new  stopping  rule  for  adding  metrics:  the  ratio  of  the  relative 
improvement  in  quality  to  the  relative  increase  in  cost. 

1.1.  Related  research 

Our  model  is  one  of  a  class  of  models  concerned  with  the  classification  of  quality, 
sometimes  referred  to  as  the  identification  of  fault-prone  modules.  Porter  and  Selby 
[1990]  used  classification  trees  to  partition  multiple  metric  value  space  so  that  a  se¬ 
quence  of  metrics  and  their  critical  values  could  be  identified  that  were  associated  with 
either  high  quality  or  low  quality  software.  This  technique  is  closely  related  to  our 
approach  of  identifying  a  set  of  metrics  and  their  critical  values  that  will  satisfy  quality 
and  cost  criteria.  However,  we  use  statistical  analysis  to  make  the  identification. 

Briand  et  al.  [1998]  used  logistic  regression  to  classify  modules  as  fault-prone 
or  not  fault-prone  as  a  function  of  various  object  oriented  metrics.  In  another  example 
of  logistic  regression.  Khoshgoftaar  and  Allen  [1997]  used  it  to  classify  modules  as 
fault-prone  or  not  fault-prone  as  a  function  of  faults,  requirements,  performance,  and 
documentation  software  trouble  report  metrics.  While  one  of  our  objectives  is  similar 
-  classify  modules  as  either  high  quality  or  low  quality  -  we  derive  from  this  binary 
classification  several  predictive  continuous  quality  and  cost  metrics.  These  metrics 
are  used  to  predict  the  quality  of  software  that  will  be  delivered  by  development  to 
maintenance  and  the  cost  of  achieving  it. 

Khoshgoftaar  et  al.  [1996a]  used  nonparametric  discriminant  analysis  in  each 
iteration  of  their  military  system  project  to  predict  fault-prone  modules  in  the  next 
iteration.  This  approach  provided  an  advance  indication  of  reliability  and  the  risk 
of  implementing  the  next  iteration.  They  also  conducted  a  similar  study  involving  a 
telecommunications  application,  again  using  nonparametric  discriminant  analysis,  to 
classify  modules  as  either  fault-prone  or  not  fault-prone  [Khoshgoftaar  et  al.  1996b], 
Our  approach  has  the  same  objective  but  we  produce  BDFs  in  terms  of  the  original 
metrics  as  opposed  to  using  density  functions  as  discriminators. 

Khoshgoftaar  and  Allen  [1998]  have  also  developed  models  for  ranking  modules 
for  reliability  improvement  according  to  their  degree  of  fault-proneness  as  opposed 
to  whether  they  are  fault-prone  or  not.  They  used  Alberg  Diagrams  [Ohlsson  and 


249 


N.F.  Schneidewind  /  Software  quality  control  and  prediction  model 


Alberg  1996]  that  predict  percentage  of  faults  as  a  function  of  percentage  of  modules 
by  ordering  modules  in  decreasing  order  of  faults  and  noting  the  cumulative  number 
of  faults  corresponding  to  various  percentages  of  modules.  The  imperative  in  safety 
critical  systems  like  the  Space  Shuttle  is  to  investigate  all  suspect  modules  because 
even  the  module  with  the  lowest  a  priori  reliability  risk  could  pose  a  safety  hazard 
m  operation.  Our  previous  research  showed  a  very  high  association  between  module 
failures  and  metric  values  that  exceeded  the  critical  values  [Schneidew'ind  19951  as 
we  will  show  later. 

The  following  topics  are  covered:  Discriminative  Power  model,  approach  to  vali¬ 
dation,  and  quality  control  and  prediction  applications  of  the  model,  section  2;  detailed 
description  of  validation  methodology,  section  3;  comparison  of  validation  u'ith  appli¬ 
cation  results  for  quality  control  and  prediction,  section  4;  quality  point  and  confidence 
interval  estimates,  section  5:  comparison  of  BDF  and  linear  discriminant  function  qual¬ 
ity  classification  results,  section  6:  development  metric  characteristics  of  modules  that 
railed  during  maintenance,  section  7;  and  conclusions  about  the  contributions  of  the 
model  to  quality  control  and  prediction  and  the  results  obtained  to  date  in  applying  it 
to  the  Space  Shuttle ,  section  8. 


2.  Discriminative  power  model 

2.1.  Discriminative  power  validation 

Using  our  metrics  validation  methodology  [IEEE  1998;  Schneidewind  1992],  and 
ihe  Space  Shuttle  flight  software  metrics  and  discrepancy  reports  (DRs).  we  validate 
metrics  with  respect  to  the  quality  factor  drcount.  This  is  the  number  of  discrepancy 
i sports  written  against  a  module.  In  brief,  this  involves  conducting  statistical  tests  to 
determine  whether  there  is  a  high  degree  of  association  between  drcount  and  candidate 
metrics.  As  shown  in  figure  1,  we  validate  metrics  on  one  random  sample  (validation 
sample)  of  100  modules  from  Build  1  and  apply  the  validated  metrics  to  three  random 
samples  (application  samples)  of  100  modules  each  from  Build  2  that  are  both  disjoint 
among  themselves  and  from  the  validation  sample,  drawn  from  a  population  of  1397 
modules  of  Space  Shuttle  flight  software.  Nikora  and  Munson  argue  for  the  need  of  a 
measurement  baseline  against  which  evolving  systems  may  be  compared  [Nikora  and 
Munson  1998],  Our  baseline  is  Build  1  in  figure  1.  The  measurement  results  from 
Build  1  provide  the  data  source  for  controlling  and  predicting  the  quality  delivered  to 
maintenance  and  for  comparing  predicted  with  actual  quality,  once  the  latter  is  known. 
Next,  we  define  Discriminative  Power. 

2.1.1.  Discriminative  Power 

Given  the  elements  .1  I[j  of  a  matrix  of  n  modules  and  m  metrics  (i.e..  nm  metric 
values),  the  elements  MC,  of  a  vector  of  m  metric  critical  values,  the  elements  Fz  of 
a  vector  of  n  quality  factor  values,  and  scalar  FC  of  quality  factor  critical  value, 
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.4  Development  - ►  Maintenance  of  Release 

Containing  Build  2 

Build  1:  Validation  Build  2:  Application 


Sample  1 

Samples  2,  3,  8c  4 

Design 

Test 

Design  Test 

MCj 

MCj 

^  Control  &  Predict 

Mij 

Fi 

Mij/  Qualit>’ 

Fit  Known  Quality  - ► 


Mjj  :  Metric  j  on  Module  i 
MCj!  Metric  j  Critical  Value 
Fj  :  Quality  Factor  on  Module  i 

Figure  1.  Measurement  process. 

must  be  able  to  discriminate  with  respect  to  Fj,  for  a  specified  FC.  as  shown  in  the 
following  relation: 

M-ij  >  MCj  F,  >  FC  and 

Mij  <  MCj  <s>  Fi  ^  FC  (1) 

for  i  =  1,2 , . . . ,  n,  and  j  =  1, 2, . . . ,  m  with  specified  a,  where  a  is  the  significance 
level  of  various  statistical  tests  that  are  used  for  estimating  the  degree  to  which  a  set  of 
metrics  can  correctly  classify  software  quality.  In  other  words,  do  the  indicated  metric 
relations  imply  corresponding  quality  factor  relations  in  (1)?  This  criterion  assesses 
whether  MCj  has  sufficient  Discriminative  Power  to  be  capable  of  distinguishing  a  set 
of  high  quality  modules  from  a  set  of  low  quality  modules.  If  this  is  the  case,  we  use 
the  critical  values  in  Quality  Control  and  Prediction  described  below.  The  validation 
process  is  illustrated  in  figure  1,  where  the  critical  values  MCj  are  produced  in  the  Test 
phase  of  Build  1  by  using  the  metrics  Mjj  from  the  Design  phase  and  the  quality  factor 
Fj  (e.g.,  drcount )  that  is  available  in  the  Test  phase.  Discrepancy  reports  are  written 
against  the  software  throughout  development  but  they  are  not  significantly  complete 
until  the  end  of  the  Test  phase  for  a  build  during  which  failures  are  observed.  The 
counts  of  discrepancy  reports  and  metrics  that  are  associated  with  a  module  were  col¬ 
lected  at  the  completion  of  a  build  by  a  metrics  analyzer,  using  the  source  code  as  input. 
If  a  discrepancy  report  involves  multiple  modules,  it  is  counted  against  every  module 
affected.  The  desired  quality  level  is  set  by  the  choice  of  FC.  The  lower  its  value,  the 
higher  the  quality  requirement;  conversely,  the  higher  its  value,- the  lower  the  require¬ 
ment.  A  value  of  zero  is  appropriate  for  safety-critical  systems  like  the  Space  Shuttle. 

It  is  important  to  recognize  that  validation  is  performed  retrospectively.  That  is, 
with  both  metrics  Mjj  and  quality  factor  Fj  in  hand  for  Build  1,  we  can  evaluate  how 
well  the  metrics  would  have  performed  if  they  had  been  applied  to  Build  1.  If  the 
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metrics  perform  well,  we  say  they  are  validated  and  it  is  our  expectation  that  they 
will  perform  adequately  when  applied  to  Build  2.  (i.e.,  not  as  well  as  when  applied  to 
uild  because  of  possible  differences  in  module  characteristics  between  Build  1  and 
Build  2  but  better  than  using  unvalidated  metrics).  Next,  we  describe  the  application 
or  the  model  to  quality  control  and  prediction. 

2.7.2.  Quality  control  and  prediction 

Quality  control  is  the  evaluation  of  modules  with  respect  to  predetermined  critical 
values  of  metrics.  The  purpose  of  quality  control  is  to  allow  software  managers  to 
identify  software  that  does  not  meet  quality  requirements  early  in  the  development 
process  so  corrective  action  can  be  taken  when  the  cost  is  low.  Quality  control  is 
applied  during  the  Design  phase  of  Build  2  in  figure  1  to  flaa  modules  below  quality 
limits  for  detailed  inspection.  The  validated  BDFs,  comprised  of  the  metrics  Mi?-  and 
their  critical  values  MC,  that  are  obtained  from  Build  1,  are  used  to  either  accept  or 
reject  the  modules  of  Build  2  [Schneidewind  1997a,b].  At  this  point  in  the  development 
of  Build  2,  only  the  metric  data  and  MC,  are  available. 

Quality  predictions  are  used  by  the  developer  and  maintainer  to  anticipate  rather 
than  react  to  quality  problems.  The  predictions  provide  indications  of  the  quality  of 
the  software  that  would  be  delivered  to  maintenance.  Figure  1  shows  the  metrics 
controlling  and  predicting  the  quality  of  software  that  will  be  delivered  to  maintenance 
ear/y  in  the  development  of  Build  2.  Accompanied  by  rigorous  inspection  and  test, 
this  process  will  result  in  improved  quality  of  Build  2  and  the  software  that  is  released 
to  maintenance,  of  which  Build  2  is  a  part.  Once  all  of  the  quality  factor  data 
(e.g.,  drcount)  have  been  collected  for  Build  2,  at  the  end  of  the  Test  phase  as  shown 
in  figure  1,  the  quality  of  Build  2  would  be  known.  This,  then,  becomes  the  actual 
quality  of  Build  2  in  the  maintained  software. 


3.  Validation  methodology 

The  basis  of  this  model  is  a  methodology  for  validating  BDFs  and  their  critical 
values  that  have  the  ability  to  discriminate  high  quality  from  low  quality.  We  use  a 
three-stage  process  for  selecting  metrics  for  quality  control  and  prediction: 

( 1 )  compute  critical  values  of  the  candidate  metrics; 

(2)  for  the  set  of  candidate  metrics  and  critical  values,  find  the  optimal  combination 
based  on  statistical  and  application  criteria;  and 

(3)  apply  a  stopping  rule  for  adding  metrics. 

Table  1  provides  a  functional  description  of  each  stage.  The  three  stages  take 
place  during  the  Test  phase  of  Build  1  of  figure  1,  once  all  the  quality  factor  data 
Fi  (e.g.,  drcount)  are  available.  The  sections  that  follow  provide  the  details  of  the 
statistical  analysis  for  each  stage. 
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Table  1 


Functional  description  of  metrics  validation  process. 


Statistical 

test/procedure 

Purpose 

Result 

Stage  1 

Kolmogorov- 

Smimov 

(K-S) 

Compute  the  critical  values  of  the 
candidate  metrics. 

Metrics  ranked  by  K-S  test 
results  for  input  to  stage  2. 

Stage  2 

Contingency 
table  analysis 

Use  the  critical  values  obtained  from 
stage  1  to  form  a  set  of  BDFs.  Use 
the  BDFs  to  estimate  quality  and  cost 
of  inspection  for  each  set  of  metrics, 
starting  with  one  metric,  and 
increasing  by  one  until  the  stopping 
rule  is  satisfied. 

Metric  sets  with  increasing 
numbers  of  metrics,  each  set 
with  estimated  quality  and 
cost  of  inspection. 

Stage  3 

Stopping  rule 
for  adding 
metrics 

Add  metrics  to  stage  2  until  the  ratio 
of  relative  incremental  quality  to 
relative  incremental  inspection  cost 
reaches  a  maximum. 

Validated  BDFs  and  their 
critical  values  that  provide 
the  highest  estimated  quality 
relative  to  the  estimated  cost 
of  inspection. 

Table  2 


Kolmogorov-Smirnov 

distance  for  drcount  =  0  vs.  drcount  >  0.  Validation  sample  1  (n  =  100  modules). 

Metric  (symbol) 

Definition  (counts  per  module) 

Critical  value 

Distance 

a 

Rank 

Prologue  size  ( P ) 

Change  history  line  count  in 

38 

0.5S5 

0.005 

1 

module  listing 

Statements  (5) 

Executable  statement  count 

26 

0.557 

0.005 

2 

Etal  (E 1) 

Unique  operator  count 

10 

0.492 

0.005 

3 

Nodes  (:Y) 

Node  count  (in  control  graph) 

11 

0.487 

0.005 

4 

3J.  Stage  1:  compute  critical  values 

Critical  values  MCj  are  computed,  using  a  new  method  we  have  developed, 
which  is  based  on  the  Kolmogorov-Smimov  (K-S)  test  [Conover  1971].  This  test  was 
investigated  for  application  to  software  metrics  because  of  its  ability  to  indicate  the 
value  of  a  metric  (i.e.,  critical  value)  where  maximum  discrimination  occurs  between 
two  samples  of  modules  -  one  of  high  quality  and  the  other  of  low  quality.  The 
method  has  consistently  yielded  good  results  for  controlling  the  quality  of  Space  Shuttle 
software  as  our  results  will  show.  The  K-S  test  is  exact  for  continuous  distributions 
and  conservative  (i.e.,  the  true  alpha  is  less  than  the  specified  value)  for  discrete  metrics 
data  [Conover  1971].  In  addition,  the  large  range  (e.g.,  0-2316  for  prologue  size)  and 
fine  granularity  (e.g.,  units  of  one  for  prologue  size)  of  the  metrics  data  approximate 
continuous  distributions.  Thus,  the  K-S  test  is  appropriate  for  analyzing  metrics  data. 

Table  2  shows  the  metric  definitions,  critical  values  MC;\  and  K-S  distances 
for  four  metrics  of  the  validation  sample.  These  metrics  were  selected  for  analysis 
based  on  their  relatively  high  K-S  distance  compared  to  other  metrics  that  had  been 


253 


N'F  Schneidewl,'d  /  Software  quality ■  control  and  prediction  model 


dis  ributin0nnf  ,PaC  n^e'  1116  K_S  meth0d  tests  whether  the  sample  cumulative 
statist  s  thiUm  nS  (CDF).are  fr0m  the  same  or  different  populations.  The  test 
the  CDFs  of  U  VCrtlCa  dlfference  between  the  CDFs  of  two  samples  (e.s.. 

(i  e  a  <  0  00^  the  t7Un fl  r  and  drC°Unt  >  FC)'  If  the  difference  is  significant 
for  Me"  Th°5  h  ^  °fMij  COrTeSpondin-  t0  maximum  CDF  difference  is  used 

floured Vl  !L  v  iP, 15  eX/reSSCd  in  eqUati°n  (2)-  ™S  concePt  is  illustrated  in 
=  0  and  w  A  Va  U£  of  Prologue  size,  where  we  show  the  CDFs  for  drcount 

0  and  drcount  >  0.  In  this  example,  the  critical  value  is  38.  This  is  the  value 

the^val  °8Uef  Size  v,’here  there  1S  the  maximum  difference  between  the  CDFs  This  is 

aualitv  (d  ^  Sn  ^  WhSre  there  i$  thC  maximum  discrimination  between  high 

^o  the  BDF  inTh=  ^  ,0W  ^ality  <*"»««  >  0  curve).  Metrics  are  added 

to  the  BDF  in  the  order  of  their  decreasing  K-S  distance: 

K-S(MCj)  =  max{  [CDF(M„  |  F,  S  FC)]  -  [CDF(.V0  |  F,  >  FC)] }.  (2) 

Thc  hl“QF'  °f  changes  (e.g.,  requirements,  design,  and  code)  and  other  activities 

if  a  mX  r’  ‘eStS:and  fai‘"re  and  fault  Nervations,  are  recorded  at  the  begin  S 

t  ZZtV  I0''"  Pm!°S“e) ■  The  "”mber  of  lines  in  this  1  called 

P  ogue  size.  Because  this  metnc  records  the  volatility  of  the  software  it  is 

very  good  quality  discriminator,  as  our  results  will  demonstrate.  A  statement  is  an 
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executable  statement  in  the  Hal/S  programming  language  that  is  used  to  code  the  Space 
Shuttle  flight  software. 


3.2.  Stage  2:  perform  contingency  table  analysis 
3.2.1.  Validation  contingency  table  ' 

For  each  BDF  identified  in  stage  1  we  use  the  contingency  table  (see  table  3) 
and  its  accompanying  x2  statistic  [Conover  1971]  to  further  evaluate  the  ability  of 
the  functions  to  discriminate  high  quality  from  low  quality,  from  both  statistical  (e.g., 
values  of  x~  ar>d  a)  and  application  (e.g.,  ability  of  the  metric  set  to  correctly  clas¬ 
sify  low  quality  modules)  standpoints.  In  table  3,  MCj  and  FC  classify  modules  into 
one  of  four  categories.  The  left  column  contains  modules  where  none  of  the  metrics 
exceeds  its  critical  value;  this  condition  is  expressed  with  a  Boolean  AND  function 
of  the  metrics.  This  is  the  ACCEPT  column,  meaning  that  according  to  the  classi¬ 
fication  decision  made  by  the  metrics,  these  modules  have  acceptable  quality.  The 
right  column  contains  modules  where  at  least  one  metric  exceeds  its  critical  value;  this 
condition  is  expressed  by  a  Boolean  OR  function  of  the  metrics.  This  is  the  REJECT 
column,  meaning  that  according  to  the  classification  decision  made  by  the  metrics, 
these  modules  have  unacceptable  quality.  The  top  row  contains  modules  that  are  high 
quality;  these  modules  have  a  quality  factor  that  does  not  exceed  its  critical  value  (e.g., 
drcount  =  0).  The  bottom  row  contains  modules  that  are  low  quality;  these  modules 
have  a  quality  factor  that  exceeds  its  critical  value  (e.g.,  drcount  >  0). 

Equation  (3)  gives  the  algorithms  for  making  the  cell  counts  of  modules,  using 
the  BDFs  of  F  and  My  that  are  computed  over  the  n  modules  for  m  metrics.  This 
equation  is  an  implementation  of  the  relation  given  in  (1). 

C, ,  =  COUNT  FOR((Fj  ^  FC)  A  (Mi  <  MC, )  A  •  •  •  A  (Mm  <  MCm)) , 

C\2  =  COUNT  FOR((Fj  ^  FC)  A  (Mi  >  MC,)  V  •  •  •  V  (Mm  >  MCm)), 

(3) 

On  =  COUNT  FOR((F;  >  FC)  A  (Ml  ^  MC,)  A  •  •  •  A  (Mm  <  MCm)), 

C22  =  COUNT  FOR((F  >  FC)  A  (Ml  >  MC,)  V  •  •  •  V  (Mm  >  MCm)), 

for  j  =  1 , . . . ,  m,  and  where 


COUNT(i) 


COUNTS  -  1)  +  1 
COUNTS  -  1) 


COUNT(O)  =  0. 


FOR  Boolean  expression  true, 
otherwise; 


The  counts  correspond  to  the  cells  of  the  contingency  table  (C\\,  C\2,  Ci\,  and 
C22),  as  shown  in  table  3,  where  row  and  column  totals  are  also  shown:  n,  n\,  no,  A7,, 
and  A  2.  The  analysis  could  be  generalized  to  include  multiple  quality  factors,  if 
necessary;  in  this  case,  the  contingency  table  would  have  more  than  two  rows. 
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Table  3 


Validation  contingency  table. 


A  (-Vo  MC;) 

Pi  s£  38  A  Si  ^  26 

VO/,;  >  MC;) 

Pi  >  38  V  5,  >  26 

High  quality 

Ft  ^FC 
dreount  =  0 

C\\  =  30 

Cn  =  27 
type  2 

n  i  =  57 

Low  quality 

Ft  >  FC 
dreount  >  0 

C2,  =  1 
type  I 

0 22  =  42 

n2  =  43 

Ari  =  31 

RF  =  1,  RFM  =  I 

-V:  =  69 

n  =  100 

TF  =  192 

ACCEPT 

REJECT 

In  addition  to  counting  modules  in  table  3,  we  must  also  count  the  quality  factor 
(e.g.,  dreount)  that  is  incorrectly  classified.  This  is  shown  as  Remaining  Factor,  RF, 
in  the  ACCEPT  column.  This  is  the  quality  factor  count  on  modules  that  should 
have  been  rejected.  Also  shown  is  Total  Factor.  TF,  the  total  quality  factor  count 
on  all  the  modules  in  the  sample  (i.e.,  the  sum  of  dreount).  Lastly  we  show  RFM 
(Remaining  Factor  Modules)  that  is  the  count  of  modules  with  quality  factor  count  >0 
(i.e.,  modules  with  Remaining  Factor,  RF). 

Table  3  and  subsequent  equations  show  an  example  validation,  where  the  optimal 
combination  of  metrics  from  table  2  and  their  critical  values  for  a  random  sample  of 
100  modules  (sample  1),  from  the  population  of  1397,  is  prologue  size  ( P )  with  a 
critical  value  of  38  and  statements  (S)  with  a  critical  value  of  26.  This  low  value  of 
statements  is  understandable  because  the  median  value  in  the  builds  analyzed  is  23. 
There  are  many  small  modules  that  call  a  subroutine,  compute  a  value,  and  transfer 
control  to  another  module.  Later  we  will  explain  how  we  arrived  at  this  particular 
combination  of  metrics  as  the  optimal  set. 

3.2.2.  Statistical  criteria 

We  validate  a  BDF  statistically  by  demonstrating  that  it  partitions  table  3  in 
such  a  way  that  Cu  and  C22  are  large  relative  to  Cj2  and  C21.  If  this  is  the  case,  a 
large  number  of  high  quality  modules  (e.g.,  modules  with  dreount  =  0)  would  have 
Mij  ^  MCj  and  would  be  correctly  classified  as  high  quality.  Similarly,  a  large  number 
of  low'  quality  modules  (e.g.,  modules  with  dreount  >  0)  would  have  Mxj  >  MC,  and 
would  be  correctly  classified  as  low  quality.  One  measure  of  the  degree  to  which  this 
is., the  case  is  estimated  by  the  chi-square  (*2)  statistic  [Conover  1971],  If  computed 
AT  >  Xs  (chi-square  at  specified  as)  and  if  computed  ac  <  as ,  then  these  results 
suggest  that  a^ given  BDF  can  discriminate  between  high  and  low'  quality.  However, 
because  the  x2  test  may  not  produce  consistent  results  [Eman  1998],  we  use  it  only  as 
one  of  several  indicators  of  Discriminative  Power.  Other  criteria  are  misclassification 
rates  and,  most  important,  application  criteria  (see  below).  We  note  that  the  use  of 


256 


A'.E  Schneidewind  /  Software  quality  control  and  prediction  mode1 1 


chi-square  and  alpha  as  statistical  criteria  is  independent  of  the  application  (i.e.,  these 
criteria  could  be  used  whether  the  application  is  metrics  or  personnel  management). 
Application  criteria,  on  the  other  hand,  such  as  quality  and  inspection  (see  below)  are 
meaningful  in  the  context  of  the  metrics  application. 

3.2.2. 1.  Misclassification 

We  compute  the  degree  of  misclassification  in  table  3  by  noting  that  ideally 
C\\  =  ni  =  Aj,  0\2  =  0,  C21  =  0,  C22  =  rii  —  -W-  The  extent  that  this  is  not  the 
case  is  estimated  by  type  1  misclassifications  (i.e.,  the  module  has  low  quality  and  the 
metrics  “say”  it  has  high  quality)  and  type  2  misclassifications  (i.e.,  the  module  has 
high  quality  and  the  metrics  “say”  it  has  low  quality).  Thus,  we  define  the  following 
measures  of  misclassification: 

/nr 

Proportion  of  modules  of  type  1:  Pj  =  — — .  (4) 

n 

r* 

Proportion  of  modules  of  type  2:  P>  =  — — .  (5) 

n 

Proportion  of  modules  of  type  1  +  type  2:  P p  =  -2!  — — .  (6) 

n 

For  the  example,  Pj  =  (1/100)  •  100  =  1 %,  P>  =  (27/100)  •  100  =  27%,  PP  = 
((1  +27)/ 100)  x  100  =  28%. 

3.2.3.  Application  criteria 

It  is  insufficient  to  validate  only  with  respect  to  statistical  criteria.  In  the  final 
analysis,  it  is  the  performance  of  the  metrics  in  the  application  context  that  counts. 
Therefore,  we  validate  metrics  with  respect  to  the  application  criteria:  quality  and 
inspection,  which  are  related  to  the  quality  achieved  and  the  cost  to  achieve  it,  respec¬ 
tively  [Schneidewind  1997a,b].  At  the  Design  phase  of  Build  2  in  figure  1,  w'e  predict 

that  the  quality  computed  by  equations  (7)— ( 1 2)  will  be  delivered  to  maintenance,  as¬ 
suming  that  the  modules  that  are  rejected  by  the  quality  control  process  are  inspected 
and  tested  and  that  the  problems  that  are  found  are  corrected.  Furthermore,  we  predict 
that  the  degree  of  inspection  computed  by  equation  (13)  will  be  required  to  achieve 
this  quality. 


3. 2. 3.1.  Quality 

First,  we  estimate  the  ability  of  the  metrics  to  correctly  classify  quality,  given 
that  the  quality  is  known  to  be  low:  proportion  of  low  quality  (e.g.,  dreount  >  0) 
modules  correctly  classified 


LQC  = 


(7) 


For  the  example,  LQC  =  (42/43)  -  100  =  97.7%. 

Second,  we  estimate  the  ability  of  the  metrics  to  correctly  classify  quality,  given 
that  the  BDF  has  classified  modules  as  ACCEPT.  This  is  done  by  summing  the  quality 
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factor  in  the  ACCEPT  column  in  table  3  to  produce  Remaining  Factor,  RF  (e.g., 
remaining  drcount ),  given  by  equation  (8): 

n 

R¥  =  T2  F'  F0R((^  >  FC)  A  (Mu  <  MCl)  A  •  ■  •  A  (Mij  ^  MCj)  A  •  •  • 

1=1 

<  MCm)),  for  j  =  ,m.  (8) 

This  is  the  sum  of  quality  factor  F{  (e.g.,  drcount )  on  modules  incorrectly  classi¬ 
fied  as  high  quality  because  (F,  >  FC)A(M;j  sC  MC,)  for  these  modules.  We  assume 
that  the  elements  of  F- ;  are  additive  and  that  the  lower  its  value,  the  higher  the  quality 
of  the  module.  This  would  be  the  case  for  any  quality  factor  of  interest  in  this  analysis: 
discrepancy  report  count,  error  count,  fault  count,  and  failure  count. 

We  estimate  the  proportion  of  RF  by  equation  (9),  where  TF  is  the  total  quality 
factor  Fi  for  the  validation  sample: 

RF 

RFP=^-  (9) 

For  the  example,  from  table  3  there  is  a  one  DR  on  one  module  that  is  incorrectly 
classified  (i.e.,  RF  =  1).  The  total  number  of  DRs  for  the  100  modules  is  192. 
Therefore,  RFP  =  (1  192)  •  100  =  0.52%. 

We  estimate  the  density  of  RF  by  equation  (10): 

RFD  =  — .  (10) 

n 

For  the  example,  RFD  =  1/100  =  0.01  drcount /module. 

In  addition,  we  estimate  the  count  of  modules  that  were  incorrectly  classified 
because  they  have  DRs  written  against  them  (i.e.,  have  Fi  >  FC).  The  proportion 
remaining  RMP  is  given  by  equation  (11).  Note  that  RMP  =  P|  (proportion  of  type  1 
misclassifications)  when  FC  =  0  (i.e.,  the  only  modules  with  Fi  >  0  will  be  in  the 
C21  cell);  see  table  3. 


where  RFM  is  given  by 


RMP  = 


RFM 

n 


(11) 


RFM  =  COUNT  FOR((F,  >  0)  A  (Mn  ^  MCj)  A  •  ■  ■  A  (MtJ  sC  MC,) 

A  •  •  •  A  ^  MCttj)) >  for  j  =  \ ,rn.  (12) 

For  the  example,  there  is  one  accepted  module  with  one  DR,  so  RMP  =  (1  /]  00)- 100  = 

1%. 


32.3.2 .  Inspection 

Inspection  is  one  of  the  costs  of  high  quality.  We  are  interested  in  weighing 
inspection  requirements  (i.e.,  percent  of  modules  rejected  and  subjected  to  detailed 
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Table  4 

Discriminative  Power  validity  evaluation  (sample  1,  n  =  100  modules). 

_ Critical  values _  Statistical  criteria  Application  criteria 

Metric  set  P  S  E\  N  Px  P:  &c  for  \zc  LQC  RFP  RMP  / 

_ _ _ ^  *  9c  * 

p  38  2  21  33.2  8.4  x  l(Ty  95.3  1.56  2  62 

PS  38  26  1  27  26.7  2.4  x  10~7  97.7  0.52  1  69 

P-S.E  I  38  26  10  1  30  22.5  2.1  x  10~fi  97.7  0.52  1  72 

K-S  distance  0.585  0.557  0.492  0.487 

P:  prologue  size,  5:  statements.  El:  etal,  Ar:  nodes 

inspection)  against  the  quality  that  is  achieved,  for  various  BDFs.  We  estimate  inspec¬ 
tion  requirements  by  noting  that  all  modules  in  the  REJECT  column  of  table  3  must 
be  inspected;  this  is  the  count  C\i  +  C22.  Thus,  the  proportion  of  modules  that  must 
be  inspected  is  given  by 

T  Cp  -r  C~n 

1  = (13) 
n 

For  the  example,  I  =  ((27  +  42)/ 100)  ■  100  =  69 7c  and  the  percentage  accepted  is 
1-7  =  31%. 


3.2 A.  Summary  of  validation  results 

The  results  of  the  validation  example  are  summarized  in  table  4.  The  properties  of 
dominance  and  concordance  are  evident  in  these  validation  results  and  in  other  samples 
we  have  analyzed  from  this  data.  That  is,  a  point  is  reached  in  adding  metrics  where 
Discriminative  Power  is  not  increased  because:  (1)  the  contribution  of  the  dominant 
metrics  in  correctly  classifying  quality  has  already  taken  effect,  and  (2)  additional 
metrics  essentially  replicate  the  classification  results  of  the  dominant  metrics  -  the 
concordance  effect.  This  result  is  due  to  the  property  of  the  BDF  used  as  an  OR 
function,  which  will  cause  a  module  to  be  rejected  if  only  one  of  the  module’s  metrics 
exceeds  its  critical  value.  These  effects  can  only  be  observed  if  a  marginal  analysis  is 
performed,  where  metrics  are  added  to  the  set  one-by-one  and  the  calculations  shown 
in  table  4  are  made  after  each  metric  is  added.  For  each  added  metric,  its  effect  is 
evaluated  with  respect  to  both  statistical  and  application  criteria.  In  addition,  a  suitable 
stopping  rule  must  be  used  to  know  when  to  stop  adding  metrics  (see  the  next  section). 

3.3.  Stage  3:  Apply  a  stopping  rule  for  adding  metrics 

One  rule  for  stopping  the  addition  of  metrics  to  a  BDF  is  to  quit  when  RFP  no 
longer  decreases  as  metrics  are  added.  This  is  the  maximum  quality  rule.  This  rule  is 
illustrated  in  table  4.  When  a  third  metric,  etal  (E 1),  is  added,  there  is  no  decrease 
in  RFP  and  RMP  nor  is  there  an  increase  in  LQC.  If  it  is  important  to  strike  a  balance 
between  quality  and  cost  (i.e.,  between  RFP  and  7),  we  add  metrics  until  the  ratio  of 
the  relative  change  in  RFP  to  the  relative  change  in  7  is  maximum,  as  given  by  the 
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Table  5 


Application  contingency  table. 


A (-*Aj  ^  MCj ) 

P,  $  38  A  5,  s$  26 

\J{Mx j  >  MCj) 

Pi  >  38  V  Si  >  26 

High  quality 

7 

Type  2 

?  ? 

Low  quality 

Type  1 

0 

?  7 

> 

1! 

-P- 

O 

> 

II 

On 

O 

II 

O 

o 

ACCEPT 

REJECT 

Quality  Inspection  Ratio  (QIR)  in  equation  (14),  where  i  refers  to  the  previous  RFP 
and  I : 

_  !arfp|/rfr 

QIR  -  — AJ/J,  ■  <14> 

For  the  example, 

QIR(P  -*  P,S)  =  j^_52~  1-561/1.56 
(69  —  62)/62 

This  is  the  value  of  QIR  in  going  from  one  metric  prologue  size  ( P )  to  two  metrics 
(P,5),  adding  (5). 

Also,  QIR(P.  S  — <■  P,S,E  1)  =  0.  This  is  the  value  of  QIR  in  goins  from  two 
metrics  (P ,5)  to  three  metrics  (P, 5, £]),  adding  etal  (PI). 

Therefore,  we  stop  adding  metrics  after  statements  has  been  added.  In  this  par¬ 
ticular  case,  equation  (14)  produces  the  same  metric  set  as  the  maximum  quality  rule. 


4.  Comparison  of  validation  with  application  results 

In  order  to  compare  validation  with  application  results,  we  first  show  how  the 
Contingency  table  looks  at  the  Design  phase  of  Build  2  in  figure  1,  when  only  the 
metrics  My  and  their  critical  values  MC;  are  available.  This  is  shown  in  table  5,  where 
the  “?”  indicates  that  the  quality  factor  data  F{  are  not  available  when  the  validated 
metrics  are  used  in  the  quality  control  function  of  Build  2.  During  the  Design  phase 
of  Build  2,  modules  are  classified  according  to  the  criteria  that  have  been  described. 
A  second  disjoint  random  sample  of  100  modules  (sample  2)  was  used  to  illustrate 
the  process.  Whereas  31  and  69  modules  were  accepted  and  rejected,  respectively, 
during  Build  1,  40  and  60  modules  were  accepted  and  rejected,  respectively,  during 
Build  2.  The  rejected  modules  would  be  given  priority  attention  (i.e.,  subjected  to 
rigorous  inspection). 

A  comparison  of  the  validation  sample  (Build  1)  with  the  application  samples 
(Build  2)  with  respect  to  statistical  criteria  is  shown  in  table  6.  A  comparison  of  the 


260 


N.F.  Schneidewind  /  Software  quality  control  and  prediction  model 


Table  6 

Statistical  criteria  PI  and  P2  for  metric  set:  P,  5.  Validation  (sample  1)  vs.  application  (samples  2—4), 

n  =  100  modules. 


PI:  percentage  type 

1  misclassification 

P2:  percentage  type 

2  misclassification 

Sample  1  Sample  2 

Sample  3 

Sample  4 

Sample  1  Sample  2 

Sample  3  Sample  4 

1.0  1.0 

4.0 

3.0 

27.0  24.0 

18.0  22.0 

Table  7 

Application  criteria  LQC  and  RFP  for  metric  set:  P,5.  Validation  (sample  1)  vs.  application  (samples 

2-4),  n  =  100  modules. 


LQC:  percentage  of  low  quality  modules  (drcount  RFP:  percentage  of  quality  factor  ( drcowit )  incor- 
>  0)  correctly  classified  _  rectly  classified 


Sample  1 

Sample  2 

Sample  3 

Sample  4 

Sample  1  Sample  2  Sample  3 

Sample  4 

97.7 

97.3 

91.1 

93.2 

0.52  0.62  3.01 

1.50 

Table  8 

Application  criteria  RFD  and  I  for  metric  set:  P,S,  Validation  (sample  1)  vs.  application  (samples  2—4). 

n  =  1 00  modules. 


RFD:  density  of  quality  factor  {drcount  /  modu\t) 
incorrectly  classified 

/: 

percentage  of  modules  inspected 

Sample  1  Sample  2  Sample  3 

Sample  4 

Sample 

1  Sample  2  Sample  3  Sample  4 

0.01  0.01  0.05 

0.03 

69 

60  59  . .  63 

validation  sample  with  the  application  samples  with  respect  to  application  criteria  is 
shown  in  tables  7  and  8.  As  we  have  mentioned,  only  metrics  data  is  available  when  the 
validated  metrics  are  applied  during  the  Design  phase  of  Build  2  in  figure  1.  However, 
to  have  a  basis  for  comparison  with  the  validation  results,  we  computed  the  values 
shown  in  tables  6-8  retrospectively  (i.e.,  after  Build  2  was  far  enough  along  to  be  able 
to  collect  all  of  the  quality  factor  data  at  the  conclusion  of  the  Test  phase).  The  values 
for  samples  2-4  in  tables  7  and  8  are  the  actual  quality  delivered  to  maintenance,  as 
shown  during  the  Test  phase  of  figure  1.  The  reader  should  compare  the  results  of 
samples  2-4  with  those  of  sample  1  in  the  tables.  As  the  accuracy  of  classification 
of  low  quality  software  increases,  the  accuracy  of  classifying  high  quality  software 
decreases  and  inspection  cost  increases.  However,  the  more  important  consideration 
is  to  prevent  low  quality  software  from  being  delivered  to  maintenance,  particularly  in 
safety  critical  systems  like  the  Space  Shuttle. 


5.  Quality  point  and  confidence  interval  estimates 

In  addition  to  the  quantities  in  tables  3-8,  there  are  other  quantities  of  interest, 
such  as  the  proportion  of  modules  with  zero  and  non-zero  drcount  and  their  confidence 
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intervals.  For  these  quantities,  software  developers  and  maintainers  are  provided  with 
o  point  estimates  and  interval  estimates  of  the  range  in  which  the  actual  quality 
va  ues  are  Italy  to  fall.  Thus,  they  are  able  to  anticipate  rather  than  react  to  quality 

rh,  n  T;  7  efXamp'\eStimateS  °btained  from  Build  1  in  figure  1  are  used  to  predict 
the  quality  of  software  that  would  be  delivered  to  maintenance  if  corrective  action  were 

not  taken.  This  action  is  the  quality  control  step  of  the  Design  phase  of  Build  ?  where 
modules  are  rejected  and  subjected  to  detailed  inspection  and  test  if  their  metrics  values 
xceed  the  critical  values.  In  addition,  the  estimates  provide  indications  of  resource 
levels  that  are  needed  to  achieve  quality  goals.  For  example,  if  the  predicted  quality 
of  the  software  were  lower  than  the  specified  quality,  the  difference  would  be  an 

indication  of  increased  usage  of  personnel  and  computer  time  during  inspection  and 
testing,  respectively. 

A  benefit  of  using  confidence  limits  is  that  they  provide  protection  against  pre¬ 
diction  error.  A  prediction  error  could  arise  because  the  very  act  of  measuring  and 
predicting  may  affect  the  predictions  -  the  Heisenberg  Principle.  For  exampleT/w- 
ogue  size,  the  record  of  change  history',  has  proven  to  be  a  good  predictor  of  quality, 
owever,  if  the  software  is  changed  in  response  to  problems  observed  during  the  qual- 
ity  control  function,  thereby  adding  to  the  change  history  and  prologue  size," this  effect 
would  tend  to  make  the  original  predictions  optimistic.  Another  protection  against 

prediction  error  is  to  periodically  repeat  the  predictions  as  the  software  evolves" over 
the  life  cycle. 

The  normal  approximation  to  the  binomial  distribution  is  used  to  estimate  the 
confidence  limits  of  the  proportions.  This  distribution  is  used  because  we  are  interested 
in  estimating  the  proportions  of  modules  and  drcount  that  fall  into  one  of  two  categories 
(i.e.,  a  module  is  either  accepted  or  rejected  or  DRs  are  either  present  or  not  present 
on  a  module).  The  normal  approximation  gives  the  mean  proportion  p  of  modules  or 
DRs  that  fall  into  one  of  two  categories  and  the  confidence  limits  are  a  function  of  p. 

The  point  and  confidence  limit  estimates  for  module  and  quality  factor  counts 
use  terms  that  are  defined  below.  Where  it  is  necessary  to  distinguish  validation  from 
application  quantities  in  the  computations,  we  use  primed  notation  for  the  latter. 

n.  number  of  modules  in  the  validation  and  application  samples  (see  tables  3  and  5 
respectively).  ~ 

A,:  number  of  modules  accepted  in  the  validation  sample  of  Build  1. 

A2:  number  of  modules  rejected  in  the  validation  sample  of  Build  1. 

A  i-  number  of  modules  accepted  in  the  application  samples  of  Build  2. 

Ag-  number  of  modules  rejected  in  the  application  samples  of  Build  2. 

5.1.  Module  counts 

e  u  C°Unt  estimates  are  made  using  the  validation  sample  in  the  Test  phase 

®U!  d  ’  ‘  These  estimates  are  applied  to  the  application  samples  in  the  Design  phase 
ot  Build  2  and  compared  with  actual  values  in  table  9. 
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The  proportion  of  all  modules  with  quality  factor  Fj  >  0  (e.g.,  drcount  >  0  on 
module  i)  in  the  entire  validation  sample  is  given  by  equation  (15): 

COUNT?- 1  FOR  Ft  >  0 

Pn  =  - — - 1 - ,  (15) 

7! 

where 


COUNT(i) 


COUNT(t  -  1)  +  1 
COUNT(f  -  1), 


COUNT(O)  =  0. 


FOR  expression  true. 
otherwise; 


We  use  this  equation  to  estimate  p'n  in  the  application  samples.  We  obtain  the  two-sided 
confidence  interval  of  pn  from  expression  (16).  We  use  this  expression  to  estimate  the 
lower  and  upper  limits  of  p'n  in  the  application  samples: 


Pn  Za/2 


(Pn)O  ~Pn) 


n 


(16) 


As  shown  in  table  9,  we  would  expect  the  proportion  of  all  modules  with  drcount  >  0 
in  maintenance  to  be  between  33.3-52.7%  unless  corrective  action  is  taken  to  make 
these  limits  lower.  If  corrective  action  is  taken,  this  estimate  provides  bounds  on  the 
resources  -  personnel  and  computer  time  -  that  would  be  required  to  inspect,  correct, 
and  test  defective  modules. 

The  proportion  of  accepted  modules  with  quality  factor  Ft  >  0  (e.g.,  drcount 
>  0  on  module  i)  in  the  validation  sample  is  given  by  equation  (17),  where  RFM  is 
obtained  from  equation  (12): 


P-V  i 


RFM 

Ar.  ' 


(17) 


We  use  this  equation  to  estimate  pN{  in  the  application  samples.  We  obtain  the  one¬ 
sided  upper  confidence  limit  of  pN\  from  expression  (18).  We  use  this  expression  to 
estimate  the  upper  limit  of  p.V[  in  the  application  samples: 


pA  i  +  Za 


(pA’i)(l  —  pAri ) 


AT, 


(18) 


As  shown  in  table  9,  we  would  expect  the  proportion  of  accepted  modules  with  drcount 
>  0  in  maintenance  to  be  ^  8.45%  as  the  result  of  the  quality  control  effort  in  the 
Design  phase  of  Build  2. 

The  proportion  of  rejected  modules  with  quality  factor  Fi>  0  (e.g..  drcount  >  0 
on  module  i)  in  the  validation  sample  is  given  by  equation  (19): 


><">:<RFM>,  (I9) 

A  2 

This  is  equal  to:  ( all  modules  with  quality  factor  Fj  >  0)  minus  ( accepted  modules 
with  quality  factor  F;  >  0),  divided  by  the  number  of  rejected  modules.  We  use  this 
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"i ob,ai" the  ■»« 

lower  limit  of  p.v<  fn  The  applied sampte  ‘°  'Stima“ 


(20) 

As  shown  in  table  9,  we  would  expect  the  proportion  of  rejected  modules  with  drawn 
Design  ^Tb',^  >  5,'2«  “  ^  -ro,  effonlZ 

5.2.  Quality  factor  counts 

Quality  factor  proportion  count  estimates  in  (21)-(24)  are  made  usin*  the  val 

S  Tnda,T  e  in  He  Tf  PhKe  0f  B“'ld  1  Q-my  factor  totaTlT^l 
..  .  use  data  ^rom  validation  sample  and  data  that  is  available  in  the 

apphcatton  samples  in  the  Design  phase  of  Build  2:  number  of  modules  “pj  V 

and  number  of  modules  rejected,  Nf  These  estimates  are  applied  to  the  application 
and 'lO5  In  1  £  DeS'Sn  PhaSe  °f  BU‘Id  2  3nd  comPared  with  actual  values  in  tables  9 

The  proportion  of  quality  factor  Ft  >  0  (e.g.,  dreamt  >  0)  that  occurs  on  accepted 
ules  in  the  validation  sample  is  given  by  equation  (21): 

.  _  RF 

1  TF’  (21) 

where  RF  is  obtained  from  equation  (8)  and  TF  is  the  total  quality  factor  F  for  the 

obtain' the  ^  ^  equation.to  estimate  d\  in  the  application  samples.  We 

one-si  ed  upper  confidence  limit  of  d\  from  expression  (22)  We  use  this 
expression  .o  estimate  the  upper  limit  of  i\  in  the  application  samples: 


d,+  7  ,WMi 

1+ZqV  TF - '  (22) 

As  shown  in  table  9,  we  would  expect  the  proportion  of  dreount  >  0  on  accepted 

the  DesLn%ITe?fTui.d°,be  *  ^  ^  °f  the  qUaHty  COntrol  effort 

moduS^rs^^  imz sr 0)  that  occurs  -  ^ 

di  =  1  -  di .  (23) 


d2  =  1  -  d. 


towe M ionfiH  qUati("  “  TT"  ^  “  ,he  WBcatfon  We  obtain  the  one-sided 

the  tower  hmTof  tf  inlh  2  T  'XprcSSi°n  <24)'  We  USe  this  'xPression  “  «tima.e 
tne  lower  limit  of  d2  in  the  application  samples: 

d~  —  7  ,  Ad2)(l  —  df) 

2  Za\  tf — '  (24) 
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Table  9 

Validation  predictions  (sample  1)  vs.  application  actual  values  (samples  2-4). 


Point  estimates 
(sample  1) 

95%  Confidence  limits 
(sample  1) 

Sample  2 

Actual  values 
Sample  3 

Sample  4 

p'n:  proportion  of  all 
modules  with 
drcount  >  0 

43.0% 

33.3-52.7% 

37.0% 

45.0% 

44.0% 

p-Vj' :  proportion  of 
accepted  modules 
with  drcount  >  0 

3.22% 

LE  8.45% 

2.50% 

9.76  % 

8.1 1ft 

pAT  proportion  of 
rejected  modules 
with  drcount  >  0 

60.9% 

GE  51.2% 

60.0% 

69.5% 

65.1% 

d\:  proportion  of 
drcount  >  0  on 
accepted  modules 

0.52% 

LE  1.38% 

0.62% 

3.01% 

1.50% 

d'2:  proportion  of 
drcount  >  0  on 
rejected  modules 

99.5% 

GE  98.6% 

99.4% 

97.0% 

98.5% 

As  shown  in  table  9,  we  would  expect  the  proportion  of  drcount  >  0  on  rejected 
modules  in  maintenance  to  be  ^  98.6%  as  the  result  of  the  quality  control  effort  in 
the  Design  phase  of  Build  2. 

The  total  quality  factor  >  0  (e.g.,  drcount  >  0)  that  occurs  on  accepted 
modules  in  the  validation  sample  is  given  by  equation  (25): 


Di 


A'l  1 


(25) 


We  use  this  equation  as  a  predictor  of  D\  in  the  application  samples.  As  shown  in 
table  10,  we  would  expect  the  total  drcount  on  accepted  modules  in  maintenance  to  be 
1.29,  1.32,  and  1.19  for  application  samples  2,  3,  and  4,  respectively.  The  reason  for 
the  three  estimates  of  sample  1  is  that  each  sample  has  a  different  number  of  accepted 
modules  A*[  in  equation  (25). 

The  total  quality  factor  of  F,  >  0  (e.g.,  drcount  >  0)  that  occurs  on  rejected 
modules  in  the  validation  sample  is  given  by  equation  (26): 


D2  = 


(TF-RF) 

N2 


•NL 


(26) 


We  use  this  equation  as  a  predictor  of  D'2  in  the  application  samples.  As  shown  in 
table  10,  we  would  expect  the  total  drcount  on  rejected  modules  in  maintenance  to 
be  166.1,  163.3,  and  174.4  for  application  samples  2,  3,  and  4,  respectively.  The 
reason  for  the  three  estimates  of  sample  1  is  that  each  sample  has  a  different  number 
of  rejected  modules  N2  in  equation  (26). 

Ten  of  the  actual  values  out  of  the  fifteen  cases  in  table  9  fall  within  the  confidence 
limits.  The  average  relative  error  across  six  comparisons  between  sample  1  versus 
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Table  10 

Validation  actual  values  and  predictions  (sample  1)  vs.  application  actual  values  (samples  ->-4) 

Actual  Estimate  Actual  Estimate  Actual  Estimate 

sample  1  sample  1  sample  2  sample  1  sample  3  sample  1  < 

Actual 
sample  4 

D\ :  total  drcount 
on  accepted  modules 
Df2 :  total  drcount 
on  rejected  modules 

1  1-29  1  1.32  5  1.19 

191  166.1  160  163.3  161  174.4 

3 

197 

Table  1 1 

Comparison  of  Boolean  Discriminant  Function  (BDF)  with  Linear  Discriminant  Function  (LDF). 

evaluation  (sample  1,  n  =  100  modules). 

Validity 

Statistical  criteria  Application  criteria 

Function  Metric  set 

Fi  (%)  P2  (%)  xl  qc  for  \l  LQC  (%) 

I  (9c) 

BDF  P.  S 

LDF  9  metrics 

1-0  27.0  26.7  2.4  x  10'T  97.7 

9-0  9.0  37.5  5:0  79.1 

69.0 

43.0 

LDF  metric  set  (counts  per  module):  Halstead  etal.  eta2,  tjI.  and  r,2:  lines  of  code,  prolosue  size  nodes 
paths,  and  maximum  path. 

samples  2-4  in  table  10  is  28.9%  with  a  standard  deviation  of  30.7%.  Variation  in 
results  may  be  caused  by  sampling  error  (i.e.,  in  order  to  obtain  disjoint  samples,  it 
was  necessary  to  sample  without  replacement). 


6.  Comparison  of  Boolean  and  linear  discriminant  functions 

We  compared  the  quality  classifying  ability  during  validation  of  the  Boolean  dis¬ 
criminant  function  (BDF)  with  an  alternate  method:  the  linear  discriminant  function 
(LDF)  consisting  of  the  summation  across  metrics  of  the  product  of  standardized  met¬ 
rics  variables  and  standardized  classification  coefficients  [Jobson  1992],  For  the  BDF, 
we  used  the  optimal  metrics  set  -  prologue  size  and  statements  -  and  results  obtained 
from  table  4.  For  the  LDF.  we  used  the  set  of  nine  metrics  listed  in  table  11  and  a 
marginal  analysis,  that  yielded  the  highest  Discriminative  Power  as  measured  by  the 
eigenvalue  and  x2-  The  comparison  is  shown  in  table  11.  In  the  comparison,  we  used 
both  statistical  and  application  criteria.  In  the  application  category',  we  did  not  compute 
RFP  and  RMP  for  the  LDF  as  we  did  in  table  4.  Unlike  the  BDF  where  equations 
(8)  and  (9)  count  quality  factor  and  (11)  and  (12)  count  modules  that  are  misclassi- 
fied  into  the  A CCEPT  category,  there  is  no  algorithm  for  making  these  computations 
for  the  LDF.  It  would  have  been  necessary  to  compare  the  metrics  and  drcount  for 
each  module  with  the  LDF  to  determine  how  the  metrics  classified  the  modules  and 
drcount.  However,  a  good  comparison  is  obtained  by  using  LQC.  In  this  example, 
table  10  shows  that  the  BDF  does  a  better  job  of  classifying  the  low  quality  modules 
(e.g.,  lower  value  of  P\  and  higher  value  of  LQC)  and  that  LDF  does  a  better  job  of 
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Table  12 

Metric  characteristics  of  failed  modules. 


Failure 

number 

Severity 

level 

Module 

ID 

Prologue 

size 

Statements 

Etal 

Nodes 

drcount 

1 

2 

13 

493 

738 

46 

394 

22' 

2 

3 

974 

299 

192 

31 

98 

2 

3 

2 

1286 

115 

110 

28 

4S 

5 

4 

3 

711 

205 

/ 

5 

96 

6 

5 

3 

1300 

82 

3 

8 

20 

1 

6 

3 

515 

851 

875 

44 

529 

15 

7 

2 

464 

69 

15 

16 

12 

4 

7 

n 

465 

76 

30 

24 

21 

4 

7 

2 

466 

68 

15 

16 

12 

4 

7 

2 

467 

72 

30 

24 

21 

2 

7 

2 

468 

153 

10 

11 

75 

3 

7 

2 

472 

100 

1 

(5 

40 

1 

8 

4 

555 

943 

819 

34 

174 

26 

9 

904 

122 

128 

31 

64 

1 

10 

4 

882 

*  157 

107 

30 

51 

5 

Critical  value 

38 

26 

10 

11 

0 

Failed  modules  mean. 

253.7 

204.9 

23.6 

110.3 

6.7 

Build  2  mean 

134.6 

70.2 

16.7 

28.4 

1.8 

classifying  the  high  quality  modules  (e.g..  lower  values  of  Pi  and  I).  As  stated  in 
section  1,  the  reason  for  this  result  is  that  BDFs  make  fewer  mistakes  in  classifying 
software  that  is  low  quality  than  is  the  case  when  linear  vectors  of  metrics  are  used 
because  the  critical  values  provide  additional  information  for  discriminating  quality. 
The  implications  for  applying  the  validated  metrics  during  the  quality  control  function 
of  the  Design  phase  of  Build  2  is  that  the  BDF  would  yield  higher  quality  and  the 
LDF  would  yield  lower  cost.  Our  preference  is  the  BDF  in  a  safety  critical  system 
like  the  Space  Shuttle,  where  high  quality  software  is  the  paramount  objective. 


7.  Metric  characteristics  of  failed  modules 

Further  evidence  of  the  model’s  ability  to  identify  low  quality  during  development 
is  shown  in  table  1 2.  This  table  shows  the  1 5  modules  that  failed  during  maintenance 
of  the  1397  modules  of  Build  2  in  figure  1,  where  the  severity  of  the  10  failures 
decreases  from  2  to  4.  In  the  case  of  failure  #7,  six  modules  caused  this  failure.  The 
table  also  shows  the  module  metrics  and  validated  critical  values  that  were  obtained 
during  Build  1.  For  all  failed  modules,  one  or  more  of  their  metric  values  exceed  the 
critical  value.  Metric  values  in  italics  would  fail  to  reject  these  modules  during  quality 
control  of  the  Design  phase  of  Build  2.  However,  this  would  be  compensated  for  by 
the  metric  prologue  size  that  would  have  correctly  rejected  all  of  these  modules.  To 
illustrate  the  difference  in  metric  characteristics  of  the  failed  modules  versus  all  the 
modules  of  Build  2,  the  means  of  each  were  computed.  The  difference  in  means  is 
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sigmhc^t  at  a  <  0.05.  As  this  example  illustrates,  although  a  metrics  program  can 

from  o  6Ve  °peTr  t0u the  P°ssibllity  of  unreliable  software,  it  cannot  prevent  failures 

urnng.  n  this  example,  the  inspection  and  test  process  failed  to  find  and 
correct  the  problems  before  Build  2  entered  maintenance. 


8.  Conclusions 

A  mod^  was  developed  for  controlling  and  predicting  the  quality  of  software  that 
is  delivered  by  development  to  maintenance.  The  model  provides  software  developers 
and  main  tamers  with  both  point  estimates  and  interval  estimates  of  the  range  in  which 

the  actual  quality  values  are  likely  to  fall.  Thus,  they  are  alerted  to  the  need  to  take 
corrective  action. 

It  is  important  when  validating  and  applying  metrics  to  consider  both  statistical 
and  application  criteria  and  to  measure  the  marginal  contribution  of  each  metric  in 

,ISfyin°,theSe  Cntena'  When  this  approach  is  used,  we  observe  that  a  point  is  reached 
where  adding  metrics  makes  no  contribution  to  improving  quality  and  the  cost  of 
using  additional  metrics  increases.  This  phenomenon  is  due  to  the  metric  classification 
properties  of  dominance  and  concordance.  Using  our  approach,  we  achieved  an  error 
o  ^  o  in  classifying  quality  factors  for  the  samples  used  in  the  studv.  The  ratio  of 
the  relanve  improvement  in  quality  to  the  relative  increase  in  inspection  cost  is  a  new 
ana  effective  stopping  rule  for  adding  metrics. 

Our  Boolean  discriminant  function  (BDF)  is  a  new  type  of  discriminant  for  clas- 
si  y,ng  ®°fwar®  qualuy  t0  suPPOrt  an  integrated  approach  to  control  and  prediction  in 
one  model,  and  our  application  of  the  Kolmogorov-Smimov  distance  is  a  new  way 
o  etermine  a  metric  s  critical  value.  On  this  application,  the  BDF,  usin°-  two  met¬ 
rics,  was  superior  to  a  linear  discriminant  function,  using  nine  metrics,  in  classifying 

ow  quality  software:  however,  when  used  for  quality  control,  the  BDF  requires  more 
inspection. 

Finally  with  a  very  limited  sample  of  modules  that  caused  failures  we  found  that 
the  validated  metrics,  if  they  had  been  applied  to  the  modules  that  eventually  failed 
would  have  acted  as  early  indicators  of  these  failures. 


Acknowledgements 

We  wish  to  acknowledge  the  support  provided  for  this  project  by  Dr.  William 
Farr  of  the  Naval  Surface  Warfare  Center  and  Dr.  Allen  Nikora  of  the  Jet  Propulsion 
Laboratory;  the  data  provided  by  Prof.  John  Munson  of  the  University  of  Idaho-  the 
data  and  assistance  provided  by  Ms.  Julie  Barnard  of  United  Space  Alliance;  the 

helpful  comments  of  Dr.  Linda  Rosenberg  of  NASA,  Goddard;  and  the  anonymous 
reviewers. 


268 


N.F.  Schneidewind  /  Software  quality  control  and  prediction  model 


References 

Bnand,  L.C.,  J.  Daly,  V.  Porter,  and  J.  Wiist  (1998).  “Predicting  Fault-Prone  Classes  with  Design  Mea¬ 
sures  in  Object-Oriented  Systems.”  In  Proceedings  of  the  Ninth  International  Symposium  on  Software 
Reliability  Engineering ,  IEEE  Computer  Society  Press,  Los  Alamitos,  CA,  pp.  334-343. 

Conover,  WJ.  (1971),  Practical  Nonparametric  Statistics ,  Wiley,  New  York. 

Eman,  K.E.  (1998).  "The  Predictive  Validity  Criterion  for  Evaluating  Binary  Classifiers”,  In  Proceedings 
of  the  Fifth  International  Metrics  Symposium ,  IEEE  Computer  Society  Press,  Los  Alamitos.  CA,  pp. 
235-244. 

IEEE  Standard  for  a  Software  Quality  Metrics  Methodology  (1998),  Revision,  IEEE  Std  1061,  IEEE 
Standards  Office.  Piscataway,  NJ. 

Jobson,  J.D.  (1992).  Applied  Multivariate  Data  Analysis ,  Vol.  II,  Springer-Verlag,  New  York. 

Khoshgoftaar,  T.M.  and  E.B.  Allen  (1997).  “Logistic  Regression  Modeling  of  Software  Quality,”  TR- 
CSE-97-24,  Department  of  Computer  Science  &  Engineerins,  Florida  Atlantic  University.  Boca  Raton, 
FL. 

Khoshgoftaar.  T.M.  and  E.B.  Allen  (1998).  "Predicting  the  Order  of  Fault- Prone  Modules  in  Legacy 
Software.”  In  Proceedings  of  the  Ninth  International  Symposium  on  Software  Reliability  Engineering , 
IEEE  Computer  Society  Press,  Los  Alamitos,  CA,  pp.  344-353. 

Khoshgoftaar,  T.M.,  E.B.  Allen,  R.  Halstead,  and  G.  P.  Trio  (1996a),  “Detection  of  Fault-Prone  Software 
Modules  During  a  Spiral  Life  Cycle,”  In  Proceedings  of  the  International  Conference  on  Software 
Maintenance ,  IEEE  Computer  Society  Press,  Los  Alamitos,  CA,  pp.  69-76. 

Khoshgoftaar,  T.M.,  E.B.  Allen,  K.  Kalaichelvan,  and  N.  Goel  (1996b),  “Early  Quality  Prediction:  A 
Case  Study  in  Telecommunications,”  IEEE  Software,  13,  I,  65-71. 

Nikora,  A.P  and  J.C.  Munson  (1998).  “Determining  Fault  Insertion  Rates  for  Evolving  Software  Systems,” 
In  Proceedings  of  the  Ninth  International  Symposium  on  Software  Reliability  Engineering,  IEEE 
Computer  Society  Press,  Los  Alamitos,  CA.  pp.  306-315. 

Ohlsson,  N.  and  H.  Alberg  (1996).  “Predicting  Fault-Prone  Software  Modules  in  Telephone  Switches," 
IEEE  Transactions  on  Software  Engineering ,  22,  12,  886-894. 

Porter,  A. A.  and  R.W.  Selby  (1990;,  “Empirically  Guided  Software  Development  Using  Metric-Based 
Classification  Trees.”  IEEE  Software,  7,  2,  46-54. 

Schneidewind,  N.F.  (1992),  “Methodology  for  Validating  Software  Metrics,”  IEEE  Transactions  on  Soft¬ 
ware  Engineering,  IS,  5,  410-422. 

Schneidewind,  N.F.  (1995),  “Work  in  Progress  Report:  Experiment  in  Including  Metrics  in  a  Software 
Reliability  Model,”  In  Proceedings  of  the  Annual  Oregon  Workshop  on  Software  Metrics .  Portland 
State  University,  Portland,  OR,  17  pages. 

Schneidewind,  N.F.  (1997a),  “Software  Metrics  Model  for  Quality  Control,”  In  Proceedings  of  the  Inter¬ 
national  Metrics  Symposium,  IEEE  Computer  Society  Press,  Los  Alamitos,  CA,  pp.  127-136. 

Schneidewind,  N.F.  (1997b).  “Software  Metrics  Model  for  Integrating  Quality  Control  and  Prediction,” 
In  Proceedings  of  the  International  Symposium  on  Software  Reliability  Engineering.  IEEE  Computer 
Society  Press.  Los  Alamitos,  CA.  pp.  402-415. 


269 


The  Ruthless  Pursuit  of  the  Truth  about  COTS 


Dr.  Norman  F.  Schneidevvind 
Naval  Postgraduate  School 
2822  Racoon  Trail 
Pebble  Beach 
California,  93953,  USA 
Email:  nschneid@nps.navv.mil 


Abstract 

We  expose  some  of  the  truths  about 
COTS,  discounting  some  exaggerated  claims 
about  the  applicability  of  COTS,  particularly  with 
regard  to  using  COTS  in  safety  critical  systems. 
Although  we  agree  that  COTS  has  great  potential 
for  reduced  development  and  maintenance  time 
and  cost,  we  feel  that  the  advocates  of  COTS  have 
not  adequately  addressed  some  critical  issues 
concerning  reliability,  maintainability, 
availability,  requirements  risk  analysis,  and  cost. 
Thus  we  illuminate  these  issues,  suggesting 
solutions  in  cases  where  solutions  are  feasible  and 
leaving  some  questions  unanswered  because  it 
appears  that  the  questions  cannot  be  answered  due 
to  the  inherent  limitations  of  COTS.  These 
limitations  are  present  because  there  is  inadequate 
visibility  and  documentation  of  COTS 
components. 

Introduction 

In  this  paper  we  analyze  three  important 
aspects  of  COTS  software:  1)  reliability, 
maintainability,  and  availability;  2)  requirements 
risk  assessment,  using  risk  factors  from  the  Space 
Shuttle  and  modifying  them  for  more  general  use; 
and  3)  cost  framework.  We  are  motivated  to 
address  these  issues  because  we  feel  that  the 
COTS  community  has  not  adequately  addressed 
some  very  important  questions  concerning  the 
applicability  of  COTS  when  used  in  a  host 
system.  We  define  a  host  system  as  follows:  it 
contains  both  COTS  and  non-COTS  software;  the 
latter  is  specific  to  the  operational  mission  of  the 
organization;  and  the  mission  cannot  be  satisfied 
entirely  by  COTS  components.  Our  concerns  are 
reinforced  by  Kohl:  ‘'The  most  significant 
challenges  of  V&V  of  COTS  products  has  to  do 
with  knowledge  of  the  functionality,  performance 
and  quality  of  these  products.  Because  these 
products  tend  to  be  developed  for  large, 


commercial  markets  as  opposed  to  beina 
developed  to  a  specification  for  a  single  customer^ 
they  tend  to  provide  a  variety  of  useful  and 
desirable  features  for  the  market  that  they  are 
targeted  for,  at  the  expense  of  the  specific  system 
needs  in  which  such  products  may  be  used. 
Further,  quality  and  reliability  are  sometimes  not 
considered  critical  when  time-to-market  is  a 
driving  requirement.  Thus,  it  is  sometimes  the 
case  that  these  COTS  products  contain  features 
and  functionality  that  may  not  be  fully  known, 
even  to  the  vendor.”  [KOH99], 

Many  vendors  produce  products  that  are 
not  domain  specific  (e.g.,  network  server)  or  have 
limited  functionality  (e.g.,  mobile  phone).  In 
contrast,  many  customers  of  COTS  develop 
systems  that  are  domain  specific  (e.g.,  target 
tracking  system)  and  have  great  variability  in 
functionality  (e.g.,  corporate  information  system). 
This  discussion  takes  the  viewpoint  of  how  the 
customer  can  ensure  the  quality  of  COTS 
components.  In  addition  to  direct  quality 
evaluation,  we  also  consider  requirements  risk 
analysis  in  a  later  section,  which  indirectly  affects 
quality.  We  must  distinguish  between  using  a  non¬ 
mission  critical  application  like  a  spreadsheet 
program  to  produce  a  budget  and  a  mission 
critical  application  like  military  strategic  and 
tactical  operations.  Whereas  customers  will 
tolerate  an  occasional  bug  in  the  former,  zero 
tolerance  is  the  rule  in  the  latter.  We  emphasize 
the  latter  because  this  is  the  arena  where  there  are 
major  unresolved  problems  in  the  application  of 
COTS.  Furthermore,  COTS  components  may  be 
embedded  in  host  systems.  These  components 
must  be  reliable,  maintainable,  and  available,  and 
must  interoperate  with  the  host  system  in  order  for 
the  customer  to  benefit  from  the  advertised 
advantages  of  lower  development  and 
maintenance  costs.  Interestingly,  when  the  claims 
of  COTS  advantages  are  closely  examined,  one 
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finds  that  to  a  great  extent  these  COTS 
components  consist  of  hardware  and  office 
products,  not  mission  critical  software  [CLE97], 

Obviously,  COTS  components  are  different 
from  host  components  with  respect  to  one  or  more 
of  the  following  attributes:  source,  development 
paradigm,  safety,  reliability,  maintainability, 
availability',  security,  and  other  attributes.5 
However,  the  important  question  is  whether  they 
should  be  treated  differently  when  deciding  to 
deploy  them  for  operational  use;  we  suggest  the 
answer  is  no.  We  use  reliability  as  an  example  to 
justify  our  answer.  In  order  to  demonstrate  its 
reliability,  a  COTS  component  must  pass  the  same 
reliability  evaluations  as  the  host  components, 
otherwise  the  COTS  components  will  be  the 
weakest  link  in  the  chain  of  components  and  will 
be  the  determinant  of  software  system  reliability. 
The  challenge  is  that  there  will  be  less  information 
available  for  evaluating  COTS  components  than 
for  host  components  but  this  does  not  mean  we 
should  despair  and  do  nothing.  Actually,  there  is  a 
lot  we  can  do  even  in  the  absence  of 
documentation  on  COTS  components  because  the 
customer  will  have  information  about  how  COTS 
components  are  to  be  used  in  the  host  system.  To 
illustrate  our  approach,  we  will  consider  the 
reliability,  maintainability,  and  availability 

(RMA)  of  COTS  components  as  used  in  host 
systems. 

In  addition,  COTS  suppliers  should  consider 
increasing  visibility  into  their  products  to  assist 
customers  in  determining  the  components’  fitness 
for  use  in  a  particular  application.  We  offer  ideas 
about  information  that  would  be  useful  to 
customers  and  what  vendors  might  do  to  provide 
it. 

This  paper  is  organized  as  follows:  reliability, 
maintainability,  availability,  requirements  risk 
analysis,  improved  visibility  into  COTS,  cost  as 
the  universal  COTS  metric,  and  conclusions. 

Reliability 

There  are  some  intriguing  questions 
concerning  how  to  evaluate  the  reliability  of 
COTS  components  that  we  will  attempt  to  answer 
[SCH991],  Among  these  are  the  following:  How 
do  we  estimate  the  reliability  of  COTS  when  there 
is  no  data  available  from  the  vendor?  How  do  we 
estimate  the  reliability  of  COTS  when  it  is 
embedded  in  a  host  system?  How  do  we  revise 
our  reliability  estimates  once  COTS  has  been 


upgraded?  A  fundamental  problem  arises  in 
assessing  the  reliability  of  a  software  component: 
a  software  component  will  exhibit  different 
reliability  performance  in  different  applications 
and  environments.  A  COTS  component  may  have 
a  favorable  reliability  rating  when  operated  in 
isolation  but  a  poor  one  when  integrated  in  a  host 
system.  What  is  needed  is  the  operational  profile 
of  COTS  components  as  integrated  into  the  host 
system  in  order  to  provide  some  clues  as  to  how  to 
test  COTS  components.  We  will  assume  the 
worst-case  situation  that  documentation  and 
source  code  are  not  available.  Thus,  inspection 
would  not  be  feasible  and  we  would  have  to  rely 
exclusively  on  testing  and  reliability  calculations 
derived  from  test  data  to  assess  reliability. 

The  operational  profile  identifies  the 
criticality  of  components  and  their  duration  and 
frequency  of  use.  Establishing  the  operational 
profile  leads  to  a  strategy  of  what  to  test,  with 
what  intensity,  and  for  what  duration.  We  must 
recognize  that  a  COTS  component  must  be  tested 
with  respect  to  both  its  operational  profile  and  the 
operational  profile  of  the  host  system  of  which  it 
is  a  part.  The  COTS  component  would  be  treated 
like  a  black  box  for  testing  purposes  similar  to  a 
host  component  being  delivered  by  design  to 
testing  but  without  the  documentation.  Testing  the 
COTS  components  according  to  these  operational 
profiles  will  produce  failure  data  that  can  be  used 
for  two  purposes:  1)  make  an  empirical  reliability 
assessment  of  COTS  components  in  the 
environment  of  the  host  system  and  2)  provide 
data  for  estimating  the  parameters  of  a  reliability 
model  for  predicting  future  reliability  [SCH97], 

A  comprehensive  software  reliability 
engineering  process  is  described  in  [ANS93],  As 
pointed  out  by  Voas,  black  box  and  operational 
testing  alone  may  be  inadequate  [VOA98],  In 
addition,  he  advocates  using  fault  injection  to 
corrupt  one  component  (e.g.,  COTS  component) 
to  see  how  well  other  components  (e.g.,  the  host 
system)  can  tolerate  the  failed  component.  While 
this  approach  can  identify  problems  in  the 
software,  it  cannot  fix  them  without 
documentation.  Thus  there  must  be  a  contract  with 
the  vendor  that  allows  the  customer  to  report 
problems  to  the  vendor  for  their  resolution. 
Unfortunately,  from  the  customer’s  standpoint, 
vendors  are  unlikely  to  agree  to  such  an 
arrangement  unless  the  customer  has  significant 
leverage  such  as  the  Federal  Government.  In  the 
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case  where  documentation  is  available,  it  would 
be  subjected  to  a  formal  inspection  of  its 
understandability  and  usability.  If  the 
documentation  satisfies  these  criteria,  it  would  be 
used  as  an  aid  to  inspecting  any  source  code  that 
might  be  available.  Next  we  consider  COTS 
maintainability  issues. 

Maintainability 

In  the  case  of  maintainability,  there  are  more 
intriguing  issues.  Suppose  a  problem  occurs  in  a 
host  system.  Is  the  problem  in  COTS  or  in  the 
host  software?  Suppose  it  is  caused  by  an 
interaction  of  the  two.  The  customer  knows  the 
problem  has  occurred,  but  does  not  know  how  to 
fix  it  if  there  is  no  documentation.  The  vendor,  not 
being  on  site,  does  not  know  the  problem  has 
occurred.  Even  the  vendor  may  not  know  how  to 
fix  the  problem  if  the  source  of  the  problem  is  the 
host  software  or  an  interaction  between  it  and 
COTS  components.  In  addition,  suppose  the 
customer  needs  to  upgrade  the  host  software  and 
this  upgrade  is  incompatible  with  the  COTS 
components.  Or,  conversely,  the  vendor  upgrades 
COTS  components  and  they  are  no  longer 
compatible  with  the  host  software.  Lastly,  suppose 
there  are  no  incompatibilities,  but  the  customer 
may  be  forced  to  install  the  latest  COTS 
components  upgrade  in  order  to  continue  to 
receive  support  from  the  vendor.  None  of  these 
situations  can  be  resolved  without  either  the 
customer  having  documentation  to  aid  in  fixing 
the  problem,  or  a  contract  with  the  vendor  of  the 
type  mentioned  above.  As  in  the  case  of 
reliability,  when  neither  of  these  remedies  is 
available,  problems  can  only  be  identified  but  they 
cannot  be  fixed.  Thus  the  software  cannot  be 
maintained.  An  additional  factor  that  impacts  both 
reliability  and  maintainability  is  that  the  vendor  is 
unlikely  to  continue  to  support  the  software  if  the 
customer  modifies  it.  Thus  the  situation 
degenerates  to  one  in  which  the  customer  is  totally 
dependent  on  vendor  support  to  achieve  reliability 
and  maintainability  objectives.  This  may  be 
satisfactory  for  office  product  applications  but  it  is 
unsatisfactory  for  mission  critical  applications. 
Next  we  consider  the  COTS  availability  issues. 

Availability 

High  availability  is  crucial  to  the  success  of  a 
mission  critical  system.  What  will  be  system 


availability  using  COTS?  To  attempt  to  answer 
this  question,  it  is  useful  to  consider  hardware  as  a 
frame  of  reference.  The  ultimate  COTS  is 
hardware;  it  has  interchangeable  and  replacement 
components.  Maintenance  costs  are  kept  low  and 
availability  is  kept  high  by  replacing  failed 
components  with  identical  components.  Unlike 
hardware,  availability  cannot  be  kept  high  by 
’’replacing"  the  software.  A  failed  component 
cannot  be  replaced  because  the  replacement 
component  would  have  the  same  fault  as  the  failed 
component.  Fault  tolerant  software  is  a  possibility 
but  it  has  had  limited  success.  We  see  that 
availability  is  a  function  of  reliability  and 
maintainability  as  related  by  the  formula: 

Availability  =  MTTF/(MTTF+MTTR)  = 

1  / 1  +(MTTR/MTTF), 

where  MTTF  is  mean  time  to  failure  and  MTTR  is 
mean  time  to  repair.  MTTF  is  related  to  reliability 
and  MTTR  is  related  to  maintainability.  For  high 
availability,  we  want  to  drive  time  to  failure  to 
infinity  and  repair  time  to  zero.  However,  we 
have  seen  from  the  discussion  of  reliability  and 
maintainability  that  achieving  these  objectives  is 
problematic.  Thus  to  achieve  high  availability, 
either  the  COTS  software  must  be  of  high  intrinsic 
reliability  -  probably  a  naive  assumption  -  or 
there  must  be  in  place  a  strong  vendor 
maintenance  program  (this  assumption  may  be 
equally  naive).  Next  we  consider  COTS  visibility 
issues. 

Improved  Visibility  into  COTS 

Major  drawbacks  of  including  COTS  in  a 
software  system  are  the  lack  of  visibility  into  how 
the  COTS  components  were  developed  and  an 
incomplete  understanding  of  the  components’ 
behavioral  properties  [SCH991].  Without  this 
information,  it  is  difficult  to  assess  COTS 
components  to  determine  their  fitness  for  a 
particular  application.  As  suggested  by  McDermid 
in  [TAL98],  a  partial  solution  might  be  for  COTS 
vendors  to  identify  a  set  of  behavioral  properties 
that  should  be  satisfied  by  the  software,  and  then 
certifying  that  those  properties  are  satisfied.  For 
instance,  an  operating  system  supplier  might 
certify  that  a  lower-priority  task  does  not  interrupt 
a  higher  priority  task  as  long  as  the  higher  priority 
task  holds  the  resources  required  to  continue 
processing.  COTS  vendors  might  also  include  the 
specifications  of  those  components  as  well  as 


details  of  verification  activities  in  which  those 
specifications  had  been  used  to  show  that  specific 
behavioral  properties  of  the  software  were 
satisfied.  For  instance,  an  effort  in  progress  at  the 
Jet  Propulsion  Laboratory  [JPL98]  involves 
developing  libraries  of  reusable  specifications  for 
spacecraft  software  components  using  the  PVS 
specification  language  [SRI98].  The  developers  of 
the  libraries  work  cooperatively  with  anticipated 
customers  to  develop  the  specifications  and 
identify  those  properties  that  the  components 
should  satisfy.  As  they  develop  the  libraries,  the 
component  developers  use  the  PVS  theorem 
proverb  to  show  that  the  behavioral  properties  are 
satisfied  by  the  specification.  These  proofs  are 
intended  to  be  distributed  with  the  libraries.  When 
customers  modify  the  libraries,  perhaps  to 
customize  them  for  a  new  mission,  they  will  be 
able  to  use  the  accompanying  proofs  as  a  basis  for 
showing  that  the  modified  specification  exhibits 
the  desired  behavioral  properties.  Similarly, 
commercial  vendors  could  work  with  existing  and 
potential  customers  through  user  groups  to 
discover  those  behavioral  properties  in  which 
users  are  the  most  interested,  and  then  work  to 
certify  that  their  components  satisfy  those 
properties.  Next  we  present  a  methodology  for 
analyzing  requirements  risk  when  COTS  is 
embedded  in  a  host  system. 

Requirements  Risk  Analysis 

In  this  section  we  first  describe  the  Shuttle 
risk  management  process.  Then  we  consider  how 
it  could  be  modified  to  accommodate  the  use  of 
COTS.  In  providing  this  analysis,  it  should  not  be 
inferred  that  we  necessarily  advocate  the  use  of 
COTS  on  the  Shuttle  or  on  any  other  safety 
critical  system.  Whether  COTS  should  be 
employed  would  depend  upon  many 
environmental  and  application  factors.  Rather,  our 
goal  is  to  investigate  whether  the  Shuttle  risk 
analysis  process  is  adaptable  to  the  use  of  COTS. 

Shuttle  Risk  Management  Process 

One  of  the  software  development  and 
maintenance  problems  of  the  NASA  Space  Shuttle 
Flight  Software  organization  is  to  evaluate  the  risk 
of  implementing  requirements  changes.  These 
changes  can  affect  the  reliability,  availability  and 
maintainability  of  the  software.  To  assess  the  risk 
of  change,  a  number  of  risk  factors  are  used.  The 
risk  factors  were  identified  by  agreement  between 


NASA  and  the  development  contractor  based  on 
assumptions  about  the  risk  involved  in  making 
changes  to  the  software.  This  formal  process  is 
called  a  risk  assessment.  No  requirements  change 
is  approved  by  the  change  control  board  without 
an  accompanying  risk  assessment.  During  risk 
assessment,  the  development  contractor  will 
attempt  to  answer  such  questions  as:  “Is  this 
change  highly  complex  relative  to  other  software 
changes  that  have  been  made  on  the  Shuttle?”  If 
this  were  the  case,  a  high-risk  value  would  be 
assigned  for  the  complexity  criterion.  To  date  this 
qualitative  risk  assessment  has  proven  useful  for 
identifying  possible  risky  requirements  changes 
or,  conversely,  providing  assurance  that  there  are 
no  unacceptable  risks  in  making  a  change. 

The  following  are  the  definitions  of  the  risk 
factors,  where  we  have  placed  the  factors  into 
categories  and  have  provided  our  interpretation  of 
the  question  the  factor  is  designed  to  answer.  In 
addition,  we  added  the  risk  factor  requirements 
specifications  techniques  because  we  feel  that  this 
one  could  represent  the  highest  reliability  risk  of 
all  the  factors  if  a  technique  leads  to 
misunderstanding  of  the  intent  of  the 
requirements.  For  each  of  the  risk  factors,  we 
analyze  its  appropriateness  for  COTS.  As  you  will 
see,  this  analysis  not  only  determines  the 
adaptability  of  the  process  to  COTS,  but  also 
exposes  some  serious  issues  in  the  employment  of 
COTS  in  cmy  system.  For  example,  the  Shuttle 
risk  process  is  all  about  assessing  the  risk  of 
requirements  changes.  In  COTS,  we  would  not 
want  to  attempt  changes  because  we  don’t  have 
the  necessary  source  code  and  other 
documentation.  Furthermore,  if  we  did  make  a 
change,  it  could  invalidate  our  software  license. 
This  situation  illuminates  a  serious  deficiency  in 
using  COTS.  Therefore,  our  only  recourse,  if 
feasible,  is  to  change  the  host  software  to  reflect 
the  change.  In  other  words,  COTS  has  to  be  used 
“as  is"  in  our  system.  Thus,  in  what  follows,  the 
risk  factors  are  a  function  of  the  change  in  the 
host  software  and  how  the  change  relates  to  and 
can  be  integrated  with  COTS. 

In  order  to  modify  the  Shuttle  risk  process  to 
make  it  applicable  to  the  use  of  COTS,  we  must 
change  the  software  change  metric  from  lines  of 
code  to  components.  In  addition,  we  must  change 
our  view  of  the  software  from  a  set  of  individual 
instructions  to  a  set  of  interconnected 
components.  Otherwise,  it  would  make  no  sense 
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to  talk  about  number  of  lines  of  code  to  be 
changed  in  the  host  software  when  we  only  have 
visibility  of  COTS  at  the  component  level.  We 
will  also  assume  an  object  oriented  development 
and  maintenance  paradigm. 

Requirements  Change  Risk  Factors 

The  following  are  the  definitions  of  the 
Shuttle  risk  factors  modified  to  accommodate  the 
use  of  COTS,  where,  as  mentioned  previously, 
only  host  software  components  can  be  changed, 
but  in  making  the  changes,  the  relationship  with 
COTS  components  must  be  considered.  If  the 
answer  to  a  yes/no  question  is  "yes",  it  means  this 
is  a  high-risk  change  with  respect  to  the  given 
factor.  If  the  answer  to  a  question  that  requires  an 
estimate  is  an  anomalous  value,  it  means  this  is  a 
high-risk  change  with  respect  to  the  given  factor. 
When  a  change  to  a  component  is  mentioned 
below,  it  will  be  understood  to  be  a  change  to  host 
software. 

Complexity  Factors 

o  Qualitative  assessment  of  complexity  of 
change  (e.g.,  very  complex) 

-  Is  this  change  highly  complex  relative  to 
other  software  changes  that  have  been  made 
on  the  system?  What  are  the  interfaces 
between  the  host  components  and  COTS 
components  that  are  affected  by  the  change? 
Is  the  change  more  complex  for  the  host 
system  than  for  the  host  software  alone? 

o  Number  of  modifications  or  iterations  on  the 
proposed  change 

-  How  many  times  must  the  change  be 
modified  or  presented  to  the  Change  Control 
Board  (CCB)  before  it  is  approved? 

Size  Factors 

o  Number  and  types  of  components  affected  by 
the  change 

-  How  many  components  and  types  of 
components  must  be  changed  to  implement 
the  requirements  change? 

o  Size  of  software  components  that  are  affected 
by  the  change 


-  How  many  component  objects  are  affected 
by  the  change? 

Criticality  of  Change  Factors 

o  Whether  the  software  change  is  on  a  nominal  or 
off-nominal  component  path  (i.e.,  exception 
condition) 

-  Will  a  change  to  an  off-nominal  component 
path  affect  the  reliability  of  the  software? 

o  Operational  phases  affected  by  the  changed 
component  path  (e.g.,  ascent,  orbit,  and 
landing) 

-  Will  a  change  to  a  critical  phase  of  the 
mission  (e.g.,  ascent  and  landing)  affect  the 
reliability  of  the  software? 

Locality  of  Change  Factors 

o  The  area  of  the  affected  change  (i.e.,  critical 
area  such  as  a  component  path  for  a  mission 
abort  sequence) 

Will  the  change  affect  objects  of 
components  that  are  critical  to  mission 
success? 

o  Recent  changes  to  components  in  the  area 
affected  by  the  requirements  change 

-  Will  successive  changes  to  the  components 
in  a  given  area  lead  to  non-maintainable  code? 

o  New  or  existing  components  that  are  affected 

-  Will  a  change  to  new  components  (i.e.,  a 
change  on  top  of  a  change)  lead  to  non- 
maintainable  software? 

o  Number  of  system  or  hardware  failures  that 
would  have  to  occur  before  the  components 
that  implement  the  requirement  are  executed 

-  Will  the  change  be  on  a  component  path 
where  only  a  small  number  of  system  or 
hardware  failures  would  have  to  occur  before 
the  changed  components  are  executed  ? 

Requirements  Issues  and  Function  Factors 
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o  Number  and  types  of  other  requirements 
affected  by  the  given  requirement  change 
(requirements  issues) 

-  Are  there  other  requirements  that  are  going 
to  be  affected  by  this  change?  If  so,  these 
requirements  will  have  to  be  resolved  before 
implementing  the  given  requirement. 

o  Possible  conflicts  among  requirements 
changes  (requirements  issues) 

-  Will  this  change  conflict  with  other 
requirements  changes  (e.g.,  lead  to  conflicting 
operational  scenarios) 

o  Number  of  principal  software  functions  and 
components  affected  by  the  change 

-  How  many  major  software  functions  and 
components  will  have  to  be  changed  to  make 
the  given  change? 

Performance  Factors 

o  Amount  of  memory  required  to  implement  the 
change 

-  Will  the  change  use  memory  to  the  extent 
that  other  functions  and  components  will  not 
have  sufficient  memory  to  operate 
effectively? 

o  Effect  on  CPU  performance 

-  Will  the  change  use  CPU  cycles  to  the  extent 
that  other  functions  and  components  will  not 
have  sufficient  CPU  capacity  to  operate 
effectively? 


Table  1 
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Personnel  Resources  Factors 

o  Number  of  inspections  of  components  and 
objects  required  to  approve  the  change 

-  Will  the  number  and  duration  of  inspections 
be  significant? 

o  Manpower  required  to  implement  the  change 

-  Will  the  manpower  required  to  implement 
the  software  change  be  significant? 

o  Manpower  required  to  verify  and  validate  the 
correctness  of  the  change 

-  Will  the  manpower  required  to  verify  and 
validate  the  software  change  be  significant? 

Tools  Factor 

o  Software  tools  creation  or  modification 
required  to  implement  the  change 

-  Will  the  implementation  of  the  change 
require  the  development  and  testing  of  new 
tools  -  for  example  the  development  of 
component  and  object  testing  tools? 

o  Requirements  specifications  techniques  (e.g., 
flow  diagram,  state  chart,  pseudo  code,  control 
diagram). 

-  Will  the  requirements  specification  method 
be  difficult  to  understand  and  translate  into 
components  and  objects? 

As  an  example,  Table  1  shows  a  partial  list  of  the 
risk  factors  compiled  for  the  for  the  Shuttle  Three 
Engine  Out  Auto  Contingency  and  Single  Global 
Positioning  System  requirements  changes. 

Number  of  Number  of  Number  of  Manpower 
Modifications  Requirements  Inspections  Required 
Of  Change  Issues  Required  to  Make 
Request  Change 

-  7  238  12  209.3  MW 


Discussion  requirements  risk  analysis  to  a  component 

Okiented  one,  it  is  not  clear  that  the  resultant  risk 
Although  we  believe  we  have  made  a  model  would  be  entirely  usable  because  no  matter 

reasonable  translation  from  a  code  oriented  how  we  define  the  software  entities  of  interest,  we 

still  do  not  have  equal  visibility  of  the  host 
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software  and  COTS.  We  suggest  this  is  a 
fundamental  problem  that  has  not  been  solved  by 
COTS  advocates,  particularly  for  safety  critical 
systems.  Next  w'e  present  a  framework  for 
identifying  and  analyzing  the  cost  of  COTS. 

Cost  as  the  Universal  COTS  Metric 

We  focus  on  factors  that  the  user  should 
consider  when  deciding  whether  to  use  COTS 
software  [SCH992],  We  take  the  approach  of 
using  the  common  denominator  cost.  This  is  done 
for  two  reasons:  first,  cost  is  obviously  of  interest 
in  making  such  decisions  and  second  a  single 
metric  -  cost  in  dollars  -  can  be  used  for 
evaluating  the  pros  and  cons  of  using  COTS.  The 
reason  is  that  various  software  system  attributes, 
like  acquisition  cost  and  availability  (i.e.,  the 
percentage  of  scheduled  operating  time  that  the 
system  is  available  for  use),  are  non- 
commensurate  quantities.  That  is,  we  cannot  relate 
quantitatively  “a  low  acquisition  cost"  with  "high 
availability".  These  units  are  neither  additive  nor 
multiplicative.  However,  if  it  were  possible  to 
translate  availability  into  either  a  cost  gain  or  loss 
for  COTS  software,  we  could  operate  on  these 
metrics  mathematically.  Naturally,  in  addition  to 
cost,  the  user  application  is  key  in  making  the 
decision.  Thus  one  could  develop  a  matrix  where 
one  dimension  is  application  and  the  other 
dimension  is  the  various  cost  elements.  We  show 
how  cost  elements  can  be  identified  and  how  cost 
comparisons  can  be  made  over  the  life  of  the 
software.  Obviously,  identifying  the  costs  would 
not  be  easy.  The  user  would  have  to  do  a  lot  of 
work  to  set  up  the  decision  matrix  but  once  it  was 
constructed,  it  would  be  a  significant  tool  in  the 
evaluation  of  COTS.  Furthermore,  even  if  all  the 
required  data  cannot  be  collected,  having  a 
framework  that  defines  software  system  attributes 
would  serve  as  a  user  guide  for  factors  to  consider 
when  making  the  decision  about  whether  to  use 
COTS  software  or  in-house  developed  software. 
Note  that  host  software  could  be  developed  either 
in-house  or  under  contract.  If  the  former,  the  in- 
house  cost  element  below  apply  to  host  software. 

Certainly,  different  applications  would  have 
varying  degrees  of  relationships  with  the  cost 
elements.  For  example,  flight  control  software 
would  have  a  stronger  relationship  with  the  cost  of 
unavailability  than  a  spreadsheet  application. 
Conversely,  the  latter  would  have  a  stronger 
relationship  with  the  cost  of  inadequacy  of  tool 


features  than  the  former.  Due  to  the  difficulty  of 
identifying  specific  COTS-related  costs,  our  initial 
approach  is  to  identify  cost  elements  on  the 
ordinal  scale.  Thus,  the  first  version  of  the 
decision  matrix  would  involve  ordinal  scale 
metrics  (i.e.,  the  cost  of  unreliability  is  more 
important  for  flight  control  software  than  for 
spreadsheet  applications).  As  the  field  of  COTS 
analysis  matures  and  as  additional  data  is 
collected  about  the  cost  of  using  COTS,  we  will 
be  able  to  refine  our  metrics  to  the  ratio  scale 
(e.g.,  the  cost  of  unreliability  in  a  host  system  is 
two  times  that  in  a  commercial  COTS  system). 

The  cost  elements  for  comparing  COTS 
software  with  in-house  software  are  identified 
below.  This  list  is  not  exhaustive;  its  purpose  is  to 
illustrate  the  approach.  These  elements  apply 
whether  we  are  comparing  a  system  comprised  of 
all  COTS  components  with  all  in-house 
components  or  comparing  only  a  subset  of  COTS 
components  with  corresponding  in-house 
components.  Explanatory  comments  are  made 
where  necessary.  Mean  values  are  used  for  some 
quantities  in  the  initial  framework.  This  is  the  case 
because  it  will  be  a  challenge  to  collect  any  data 
for  some  applications.  Therefore,  the  initial 
framework  should  not  be  overly  complex. 
Variance  and  statistical  distribution  information 
could  be  included  as  enhancements  if  the  initial 
framework  proves  successful. 

Cost  Elements 

Cc(j)  =  Cost  of  acquiring  COTS  software  in  year  j. 

C,(j)  =  Cost  of  developing  in-house  software  in 
yearj. 

CcG)  =  Cost  of  upgrading  COTS  software  in  year 

j- 

UiG)  =  Cost  of  upgrading  in-house  software  in 
yearj. 

P(j)  -  Cost  of  personnel  who  use  the  software 
system  in  year  j.  This  quantity  represents  the 
value  to  the  customer  of  using  the  software 
system. 

Mc(j)  =  Cost  per  unit  time  of  repairing  a  fault  in 
COTS  software  in  year  j.  This  is  the  cost  of 
customer  time  involved  in  resolving  a  problem 
with  the  vendor. 
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Mi(j)  =  Cost  per  unit  time  of  repairing  a  fault  in 
in-house  software  in  year  j. 

Rc(j) =  Mean  time  of  repairing  a  fault  that  causes  a 
failure  in  COTS  software  in  year  j.  This  is  the 
average  time  that  the  user  spends  in  resolving  a 
problem  with  the  vendor. 

Rj(j)  =  Mean  time  of  repairing  a  fault  that  causes  a 
failure  in  in-house  software  in  year  j. 

T(j)  =  Scheduled  operating  time  for  the  software 
system  in  year  j. 

Ac(j)  =  Availability  of  software  system  that  uses 
COTS  software  in  year  j. 

Aj(j)  =  Availability  of  software  system  that  uses 
software  developed  in-house  in  year  j. 

These  quantities  are  the  fractions  of  T(j)  that  the 
software  system  is  available  for  use. 

Fc(j)  =  Failure  rate  of  COTS  software  in  year  j. 

Fj(j)  =  Failure  rate  of  in-house  software  in  year  j. 

These  quantities  are  the  number  of  failures  per 
year  that  cause  loss  of  productivity  and 
availability  of  the  software  system. 

In  some  applications,  some  or  all  of  the 
above  quantities  may  be  known  or  assumed  to  be 
constant  over  the  life  of  the  software  system. 
Using  the  above  cost  elements,  we  derive  the 
equations  for  the  annual  costs  of  the  two  systems 
and  the  difference  in  these  costs.  In  the  cost 
difference  calculations  that  follow,  a  positive 
quantity  is  favorable  to  in-house  development  and 
a  negative  quantity  is  favorable  to  COTS. 

Cost  of  Acquiring  Software 

Difference  in  annual  cost  =  CcQ  -  Q(j)  (1) 

Cost  of  Upgrading  Software 

Difference  in  annual  cost  =  Uc(j)  -  Uj(j)  (2) 

Cost  of  Software  being  Unavailable  for  Use 

Annual  cost  of  COTS  software  being  unavailable 
for  use  =  (1-Ac(j))  *  P(j). 


Annual  cost  of  the  in-house  software  being 
unavailable  for  use  =  (1-Aj(j))  *  P(j). 

Difference  in  annual  cost  = 

P(j)  *  (AiCi)  -  AcG))  (3) 

Cost  of  Repairing  Software 

Average  annual  cost  of  repairing  failed  COTS 
software  =  FCG)  *  T(j)  *  Rc(j)  *  Mjj)- 

Average  annual  cost  of  repairing  failed  in-house 
software  =  F;(j)  *  TO)  *  Ri(j)  *  Mj(j). 

Difference  in  annual  cost  = 

TO)  *  ((FcO)  *  RcG)  *  McG))  -  ((FiG)  *  RiG)  * 
MiG))  (4) 

Then,  TCj.  total  difference  in  cost  in  year  j,  is  the 
sum  of  (1).  (2),  (3),  and  (4).  Because  there  is  the 
opportunity  to  invest  funds  in  alternate  projects, 
costs  in  different  years  are  not  equivalent  (i.e., 
funds  available  today  have  more  value  than  an 
equal  amount  in  the  future  because  they  could  be 
invested  today  and  earn  a  future  return). 
Therefore,  a  stream  of  costs  over  the  life  of  the 
software  for  n  years  must  be  discounted  by  k,  the 
rate  of  return  on  alternate  use  of  funds.  Thus  the 
total  discounted  cost  differential  between  COTS 
software  and  in-house  software  is: 

r,’TCj/(i+k)j 

In  this  initial  formulation,  we  have  not 
included  possible  differences  in  functionality 
between  the  two  approaches.  However,  a 
reasonable  assumption  is  that  COTS  software 
would  not  be  considered  unless  it  could  provide 
minimum  functionality  to  satisfy  user 
requirements.  Thus,  a  typical  decision  for  the  user 
is  whether  it  is  worth  the  additional  life  cycle 
costs  to  develop  an  in-house  software  system  with 
all  the  desirable  attributes. 

Conclusions 

The  decision  to  employ  COTS  on  mission 
critical  systems  should  not  be  based  on 
development  cost  alone.  Rather,  costs  should  be 
evaluated  on  a  total  life  cycle  basis  and  RMA 
should  be  evaluated  in  a  system  context  (i.e.. 
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COTS  components  embedded  in  a  host  system). 
COTS  suppliers  should  also  consider  making 
available  more  detailed  information  regarding  the 
behavior  of  their  systems,  and  certifying  that  their 
components  satisfy  a  specified  set  of  behavioral 
properties.  In  addition,  a  formal  risk  assessment  of 
requirements  should  be  performed  taking  into 
account  the  characteristics  of  host  system 
environments. 
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Abstract 

Despite  the  fact  that  there  has  been  a  surge  of  publications  in  verification  and  validation  of  knowledge-based  systems  and  expert  systems  in 
t  e  pa>t  eva  e.  t  ere  ate  sti  gaps  in  the  study  of  verification  and  validation  (V&V)  of  expert  systems,  not  the  least  of  which  is  the  lack  of 
appropriate  semantics  tor  expert  system  programming  languages.  Without  a  semantics,  it  is  hard  to  formally  define  and  analyze  knowledge 
base  anomalies  such  as  inconsistency  and  redundancy,  and  it  is  hard  to  assess  the  effectiveness  of  V&V  tools,  methods  and  techniques  that 
have  been  developed  or  proposed.  In  this  paper,  we.  develop  an  approximate  declarative  semantics  for  rule-based  knowledge  bases  and 
prov  ide  a  formal  definition  and  analysis  of  knowledge  base  inconsistency,  redundancy,  circularity  and  incompleteness  in  terms  of  theories  in 
e  first  order  predicate  logic.  In  the  paper,  we  offer  classifications  of  commonly  found  cases  of  inconsistency,  redundancy,  circularity  and 
reservePde'eneSS'  ^  ^  guide,irlCS  on  hovv  t0  remed-v  knowledge  base  anomalies  are  given.  C  1 999  Elsevier  Science  B.V.  All  rights 
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1.  Introduction 

The  last  decade  has  witnessed  a  surge  of  publications  in 
\ei ideation  and  validation  (V&V)  of  expert  systems  and 
knowledge-based  systems  which  resulted  in  several  books 
[1.2].  and  special  issues  of  several  journals  [3-6].  Major  AI 
conferences  have  had  workshops  and  special  sessions  that 
weie  de\oted  to  the  issue.  A  sample  of  additional  publica¬ 
tions  can  be  found  in  Refs.  [7-40].  Many  V&V  methods, 
techniques  and  tools  have  been  proposed,  developed  or 
implemented  for  expert  system  applications.  On  the  other 
hand,  advances  in  knowledge  engineering  have  resulted  in 
better  methodologies  and  practice  that  aim  at  reducing 
errors  and  faults  during  system  development  and  mainte¬ 
nance  [41-44].  Despite  all  these  activities,  there  are  still 
gaps  in  the  study  of  V&V  of  expert  systems,  not  the  least 
of  which  is  the  lack  of  appropriate  semantics  for  expert 
system  programming  languages.  Without  a  semantics,  it  is 
hard  to  formally  define  and  analyze  knowledge  base  (KB) 
anomalies  such  as  inconsistency  and  redundancy,  and  it  is 
hard  to  assess  the  effectiveness  of  V&V  tools,  methods  and 
techniques  that  have  been  developed  or  proposed. 
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ity:  Incompleteness:  Knowledge  base  verification 


V&V  of  expert  systems  in  general  and  V&V  of  KB  in 
particular  need  to  be  based  on  a  sound  theoretical  founda¬ 
tion.  However,  the  reality  is  that  ‘’the  construction  of  either 
declarative  or  Hoare-style  semantics  for  current  rule-based 
languages  is  a  hopeless  task”  [31].  In  the  long  run.  concern 
for  verifiability  and  reliability  should  lead  to  the  develop¬ 
ment  of  programming  languages  with  tractable  semantics 
tor  expert  system  applications.  In  the  meantime,  some 
approximate  semantics  (declarative  or  imperative)  is  needed 
to  enable  a  formal  analysis  of  properties  of  expert  system 
components  (such  as  a  KB).  For  example,  sketches  of  an 
approximate  declarative  semantics,  which  is  based  on  a 
logical  interpretation  of  a  rule  base,  and  an  approximate 
imperative  semantics,  which  is  based  on  axiomatic  logic 
and  invariants,  for  the  current  rule-based  programming 
languages  were  proposed  in  Ref.  [31]. 

Adopting  a  declarative  semantics  for  a  rule-based 
language  has  some  potential  difficulties:  (a)  It  is  hard  to 
provide  a  purely  declarative  interpretation  of  rules,  because 
they  often  behave  in  an  imperative  manner  with  the  intended 
side  effects  of  updating  a  working  memory.  Simply  treating 
a  rule  base  as  a  logical  theory  may  result  in  an  excessively 
conservative  semantics,  (b)  Due  to  the  fact  that  consistency 
in  the  first  order  logic  is  semi-decidable,  there  does  not  exist 
an  algorithm  that  can  find  all  inconsistencies  and  redundan¬ 
cies  in  an  arbitrary  first  order  KB.  thus,  making  it  difficult  to 
develop  practical  V&V  tools. 

rvcJ. 
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Table  1 

Typesetting  conventions 


Symbol 

Meaning 

D 

A  nonempty  domain  of  elements 

L 

An  interpretation 

Boldface  capital  letter 

Set  of  wff  (literals  i.  or  set  of  rules 

Ordinary  capital  letter 

Individual  wff  (literal) 

Lower-case  ordinary  letter 

Constant 

Lower-case  italic  letter! s) 

Predicate 

r 

Rule  label 

f 

Fact  label 

LHS  (r,) 

Set  of  literals  in  the  left-hand 
side  of  r. 

RHS  (r.) 

Set  of  literals  in  the  right-hand 
side  of  r 

true,  false 

Logical  value* 

A.  V.  z.  A  ,  V  .  z' 

Variable 

There  have  been  several  efforts  toward  providing  a 
precise  characterization  of  the  logical  nature  of  a  rule- 
based  KB  [11.31.35].  An  algorithm  to  detect  all  inconsis¬ 
tencies  and  redundancies  in  *’a  certain  well-defined, 
reasonably  expressive,  subset  of  all  quasi-first-order-logic 
KB”  is  presented  in  [11].’  The  results  in  [35]  indicate  that  a 
rule-based  language  is  still  amenable  to  logical  analysis. 

The  purposes  of  this  paper  are  to  (a)  Provide  an  approx¬ 
imate  declarative  semantics  for  rule-based  KB  so  that 
various  KB  anomalies  can  be  formally  defined  and  correctly 
understood.  We  go  beyond  the  results  of  [  11 .3 1 .35]  by  deal¬ 
ing  with  not  only  KB  inconsistencies  and  redundancies,  but 
also  KB  circularity  and  incompleteness,  (b)  Establish  KB 
anomaly  analysis  procedures  using  theories  in  the  first  order 
predicate  logic  (such  as  the  model  theory ,  satisfiability ,  and 
derivabiliry  of  certain  tautologous  well-formed  formulas 
[45-47]).  This  may  serve  as  the  theoretical  underpinnings 
of  practical  V&V  tools,  (c)  Offer  classifications  for  cases  of 
inconsistency,  redundancy,  circularity  and  incompleteness 
commonly  found  in  rule-based  KB.  (d)  Propose  guidelines 
on  how  to  remedy  the  anomalies  once  they  are  identified. 

The  rest  of  the  paper  is  organized  as  follows:  Section  2 
briefly  reviews  the  terms  and  concepts  to  be  used  throughout 
the  paper.  Definitions,  classifications  and  analyses  of  KB 
inconsistency,  redundancy,  circularity  and  incompleteness 
are  provided  in  Sections  3-6,  respectively.  Some  possible 
remedial  measures  for  KB  anomalies  are  discussed  in 
Section  7.  Section  8  concludes  with  remarks  about  future 
work. 


!  The  key  step  in  the  algorithm  is  the  subsumption  tests  which  must  be 
decidable  for  a  given  KB  in  order  for  the  KB  to  be  completely  analyzed  for 
inconsistency  and  redundancy.  The  subsumption  tests  will  be  decidable 
only  when  the  expressions  to  be  tested  satisfy  the  quantifier  decoupled 
(q-decoupled)  property  [1IJ.  In  general,  one  does  not  know  in  advance  if 
a  given  KB  will  generate  any  non  q-decoupled  expressions  because  there 
does  not  exist  a  syntactic  test  for  determining  the  q-decoupleability  of  the 
KB. 


2.  Preliminaries 

We  assume  that  the  reader  is  familiar  with  the  ba; 
concepts  and  terminology  in  the  first  order  predicate  los 
[45-47].  We  use  wff  to  denote  the  well -formed  formulas 
the  predicate  logic.  An  atomic  formula  (or  atom)  refers  to 
//-place  predicate  symbol  and  its  n  terms.  A  ground  atom 
one  not  containing  any  variables.  A  literal  is  an  atom  or 
negation.  To  avoid  confusion,  we  adopt  the  typesetti 
conventions  as  given  in  Table  1. 

Definition  1.  An  interpretation  of  a  wff  consists  of  a  no 
empty  domain  D.  and  an  assignment  of  "values”  to  ea 
constant,  function  symbol  and  predicate  symbol  appear! 
in  the  wff  according  to  the  following:  (a)  assigning 
element  of  D  to  each  constant:  (b)  assigning  a  mappi 
from  D"  to  D  to  each  /2-ary  function  symbol:  and  (c)  assig 
ing  a  mapping  from  D,:  to  {true,  false]  to  each  /?-ary  pret 
cate  symbol. 

Definition  2.  A  wff  H  (or  a  set  Q  of  wff)  is  satisfial 
(consistent)  if  and  only  if  there  exists  an  interpretation 
such  that  H  (or  every  wff  in  Q)  is  evaluated  to  true  for 
variable  assignments"  under  t,  which  is  denoted  =  H  ( 
£).  1  is  said  to  be  a  model  of  H  (£)  and  c  satisfies  H  (£). 
(£)  is  inconsistent  if  and  only  if  there  exists  no  model  for 
(£).  H  is  said  to  be  valid  {tautologous)  if  and  only  if  eve 
possible  interpretation  satisfies  H.  H  is  a  logical  con. 
queue e  of  Q  if  and  only  if  every  model  of  £  is  aisc 
model  of  H.  This  is  denoted  as  £  ”  H. 


Theorem  1.  Given  a  set  of  wff  £  =  {P,  ...,Q}  and  a  wff 
£  }=  H  if  and  only  if  P  a  . . .  A  Q  — *•  H  is  valid. 


Definition  3.  Let  £  and  be  sets  of  wff.  £  £  deno 

that  £  is  satisfiable  if  and  only  if  Q1  is  satisfiable  [45]. 


This  paper  focuses  on  rule-based  knowledge  bases, 
rule-based  KB  can  be  divided  into  a  set  of  facts  whicl 
stored  in  a  working  memory  (WM)  and  a  set  of  rules  sto 
in  a  rule  base  (RB).  Rules  represent  general  knowlet 
about  an  application  domain.  They  are  entered  into  a 
during  initial  knowledge  acquisition  or  subsequent 
updates.  Facts  in  a  WM  provide  specific  informal 
about  the  problems  at  hand  and  may  be  elicited  either  dy 
mically  from  the  user  during  each  problem-solving  sessi 
or  statically  from  the  domain  expert  during  knowle 
acquisition  process,  or  derived  through  rule  deduction. 


'  A  variable  assignment  is  a  mapping  from  variables  in  a  wff  to  elerrl 
in  D. 
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Table  2 

Same,  synonymous,  complementary,  mutual  exclusive,  incompatible,  and  conflict  literals 


Semantics 

Syntax 

Identical 

Different 

Equivalent 

Same:  denoted  as  L|  =  Ly.  L;  and  L;  are  syntacticallv 
identical  (same  predicate  symbol,  same  arity.  and  same 
terms  at  corresponding  positions) 

Synonymous:  denoted  L  —  L:\  Lj  and  L:  are 
syntactically  different,  but  logically  equivalent 

Conflict' 

Complementary,  denoted  L:#L:.  L:  and  L:  are  an  atom 
and  its  negation 

Mutual  exclusive:  denoted  L,  0  L:.  L:  and  U  are 
syntactically  different  and  semantically  have  opposite 
truth  values 

Incompatible:  denoted  L;  »  L:.  L,  and  L;  are 
complementary  pair  of  synonymous  literals 

•  Given  two  rules  r  and  r..  if  LHStr,)  =  {PI . P/i}  and  LHS(r.  i  =  {Pi' . P«'{.  then  LHS'r )  =  LHS.r- MffVi  e  fl  n]Pi  =  Pi' 

K  Given  two  rules  r,  and  r„  if  LHStr,  )  =  {PI . P,,}  and  LHStr,  I  =  {PI ' . P,,'}.  then  LHStr  >  =  LHS,r; )  iff  Vt  £  [l./t]  Pi  =  P,\ 

‘  Lt  and  L;  are  conflict  literals,  denoted  L,  Tj,  L:,  if  (L;#LO  v  <L  9L->viL:*L'>. 


Definition  4.  Rules  in  a  KB  have  the  format:  P,  a  ...  a 
Pn  —  R.  where  P/s  are  the  conditions  (collectively,  the  left- 
hand  side .  LHS.  of  a  rule).  R  is  the  conclusion  (or  right- 
hand  side .  RHS.  of  a  rule),  and  the  symbol  ’*  — *  ”  is  under¬ 
stood  as  the  logical  implication.  The  P/s  and  R  are  literals . 
If  the  conditions  of  a  rule  instance  are  satisfied  by  facts  in 
WM,  then  its  conclusion  is  deposited  into  \VM. 


Definition  5.  A  fact  is  represented  as  a  ground  atom,  it 
specifies  an  instance  of  a  relationship  among  particular 
objects  in  the  problem  domain.  WM  contains  a  collection 
of  positive  ground  atoms,  which  are  deposited  through 
either  assertion  (initial  or  dynamic),  or  rule  deduction. 


Definition  6.  A  negated  condition  -y ?(.v)  in  the  LHS  of  a 
rule  is  satisfied  if  p{x)  is  not  in  WM  for  any  x.  A  negated 
ground  atom  -y?(a)  in  the  LHS  of  a  rule  is  satisfied  if  put)  is 
not  in  WM.  A  negated  conclusion  -R  in  the  RHS  of  a  rule 
results  in  the  removal  of  R  from  WM,  when  the  LHS  of  the 
rule  is  satisfied.''  Rule  instances  and  negated  literals  can  be 
utilized  by  the  inference  system,  but  are  never  deposited 
into  WM  [11]. 


Definition  7.  Given  two  sets  of  literals  L  and  L\  V  is  said 
to  be  a  specialization  of  L.  denoted  L'  <  L.  if  there  exists  a 
nonempty  set  of  substitutions  B.  such  that  L'  =  (L )9.  In 
particular,  a  literal  P'  is  a  specialization  of  P.  denoted  as 
P  <  P  if  there  exists  a  nonempty  set  of  substitution  6  such 
that  P'  =  (P)0. 


Definition  8.  Given  a  set  L  of  n  literals.  p(L)  represents 
the  set  of  all  literal  permutations  in  L. 

There  would  be  no  effect  on  WM  if  R  is  not  in  WM  when  -R  is  derived. 


Definition  9.  If  r  is  a  rule  and  Pisa  literal,  the  expression 
r; P  is  used  to  indicate  arbitrary  length  derivation  of  P  from 
r;  in  terms  of  some  inference  methods.4 


Using  logical  equivalence .  we  can  always  convert  a  logi¬ 
cal  implication  into  a  disjunction  of  literals.  We  further 
simplify  the  notation  by  dropping  the  logical  connective 
“  v  ’*  from  such  a  disjunction.  For  instance,  the  set  of  wff 
{P  a  Q  — *  R.  Ua  -’V  —  W}  has  the  following  logically 
equivalent  short  representation:  {^P  -'QR.  -»UVW}  where 
each  element  in  the  set  is  a  disjunction  of  literals. 


Definition  10.  The  concepts  of  the  same .  synonymous , 
complementary,  mutual  exclusive ,  incompatible ,  and 
conflict  literals  are  defined  in  Table  2  in  terms  of  syntax 
and  semantics  considerations. 


Example  1.  Given  the  following  literals:  fatherix,  john). 
malejyarenrix ,  john),  animali sea_cucumber).  vegetable 
(sea_cucumber),  bird{ fred),  ->bird( fred).  sentjo( a,  emer- 
gency_room ).  sentjo(x.  waiting_room),  expens  ive(x). 
high _pricedix).  we  have: 

fatherix.  john)  =  fatherix.  john): 
father(x.  john)  =  male _parent(x.  john): 
b/Vtf(fred>#  —birdi fred); 

ani?nal{stajcixcumber)  G  vege tab l Wsea_cucumber): 
sentjtoyx .  emergency_room)  ©  sent_to(x.  waiting, 
room): 

expensive(x)  ^  -'high _priced(x): 
fatherix.  john)  -'male jparenti: v.  john). 


’  Strictly  .speaking,  the  expression  should  be  {r;  U  WM}  p  because 
tacts  in  WM  u  ill  be  used  during  the  derivation. 
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Table  3 

Types  of  inconsistency 


Type 

Description 

Pattern 

M 

Rules  with  the  same  LHS  result 
in  complementary  conclusions 

LHSir,)  —  LHSir, )  and  r.  b  P 
and  r.  Q.  where  P=Q 

1-2 

Rules  with  shared  condition' ') 
result  in  complementary 
conclusions 

LHS'r  )  P  LHSir  )  =  G  and 
r  -  P  and  r.  *-  Q.  where  P#Q 

1-3 

Rules  with  the  same  LHS  result 
in  mutual  exclusive  conclusions 

LHSir. )  =  LHSir.  >  arvd  r  rP 
and  r.  b  Q.  where  Pt  Q 

1-4 

Rules  with  shared  conditions 
result  in  mutual  exclusive 
conclusions 

LHS.'  r.  >  Pi  LHSfr,  >  ~  G  and 
r  -  P  and  r.  r  Q.  where  P  0  Q 

1-5 

Rules  with  the  same  LHS  result 
in  incompatible  conclusion^ 

LHSir,)  =  LHSir,)  and  r,  b  P 
and  r.  b  Q.  where  P  *  Q 

1-6 

Rules  with  reared  condition;  s> 
result  in  incompatible 
conclusions 

LHSir, )  H  LHSir, )  —  G  and 
r  -  P  and  r.  -  Q.  where  P  =*  Q 

1-7 

Rules  with  svnonymouv  LHS 
re>uit  in  eomplemenrarv 
conclusions 

LHSir,)  =  LHSfr.)  and  r.  HP 
and  r.  r  Q.  where  P=Q 

I-S 

Rules  with  shared  synonymous 
conditions  result  in 
c o:r. pie m e n tar>  c o nc  i  u ■»  i o it 

L  I  LHSir  )  and  L  G  LHSir,) 
and  L  =  L  and  r  ~  P  and 
r  -  Q.  where  P=Q 

1-9 

Rule'  with  synonymous  LHS 
result  in  mutual  exclusive 
conclusions 

LHSir  )  =  LHSir.  >  and  r  b  P 
and  r.  -  Q.  where  P  0  Q 

MO 

Rules  with  shared  svn>>n\nn>U' 
condition'  result  in  mutual 
exclii'ive  conclusion' 

I.  G  LHSir  i  and  L  G  LHSir,  ) 
and  L  L  and  r  ~  P  and 
r,  “  Q.  where  P  0  Q 

Mi 

Rule'  wit!:  synonymous  L.HS 
result  in  incompatible 
conclusions 

LHSir.)  =  LHSir  J  ^nd  r.  b  P 
and  r.  b  Q.  where  P  *  Q 

M2 

Rule>  with  shared  synonymoii' 
conditions  result  in  incompatible 
conclusions 

L  G  LHSir)  and  L  G  LHSir,  > 
and  I,  =  L  and  r  -  P  and 
r,  -  Q.  where  P  —  Q 

M3 

Rules  with  consistent  LHS  re'Ult 
in  complementaiy  conclusions 

=  .  { L.HS.’r ).  LHS.  ;  .}  A 

LHSir  )  r.  LHSir. )  =  G  A  r,  b  P 
and  r:  b  Q.  where  P=Q 

I- 14 

Rules  with  consistent  LHS  result 
in  mutual  exclusive  conclusions 

=  .  1  LHSir).  LHS  r  .}  A 

LHSir )  n  LHSir. )  =  G  A  r.  b  P 
and  r  b  Q.  where  P  0  Q 

M5 

Rule'  with  consistent  LHS  result 
in  incompatible  conclusion* 

=  .  { LHSir.),  LHS. r.  >}  A 

LHSir  )  P  LHSir.)  =  G  a  r,  b  P 
and  r.  r  Q.  wh.ere  P  Q 

II-l 

Rules  with,  a  condition  result  in 
complementary  literal 

r  -  Q.  where  P  £  LHSir.)  a 

P-Q 

11-2 

Rules  with  a  certain  condition 
result  in  incompatible  literal 

r  -  Q.  where  P  £  LHSir)  A 

P-0 

11-3 

Rules  with  a  condition  P  result  in 
mutual  exclusive  literal 

r  -  Q.  w  here  P  £  LHSir.)  A 

P-Q 

In  this  paper,  we  do  not  consider  the  situation  in  which 
rules  are  augmented  with  certainty  factors.  Because  of  the 
way  they  are  defined,  rules  and  facts  are  subsets  of  wff. 
Therefore,  the  terms  “rule”  and  “fact”  can  he  freely  replaced 
by  the  term  “wff”  throughout  the  rest  of  the  paper. 

3.  KB  inconsistency 

3.1.  Definition  of  inconsistency 

The  root  cause  of  KB  inconsistence  is  due  to  rules  in  RB. 


but  its  manifestation  is  through  \V\1.  For  instance,  the 
inconsistency  of  a  RB  containing  a  pair  of  rules  {pix)  — 
</(.vh  />(.v)  —  “y/(.y)}  is  not  apparent  until  a  fact  pi  a)  is 
asserted  into  WM.  In  general,  although  the  rules  in  a  RB 
may  be  consistent  on  their  own  (because  there  exists  a 
model  for  them),  they  can  form  an  inconsistent  theory 
when  combined  with  certain  facts  in  WM.  In  order  for  a 
KB  to  be  consistent,  there  needs  to  be  a  model  for  both  RB 
and  WM. 

On  the  other  hand,  facts  in  WM  are  changing  over  time  due 
to  dynamic  assertions  and  retractions.  If  we  use  subscripts  to 
denote  states  of  WM  at  different  times.  RB  may  be  consis¬ 
tent  with  WM;.  but  inconsistent  with  WM  where  i  ==  j. 
Thus,  relying  on  a  particular  WM  state  in  verifying  the 
consistency  of  RB  may  not  produce  an  accurate  result. 


Definition  11.  Let  WM,,  and  R(WM  >  denote  the 
initial  state  for  WM  and  the  reachability  set  of  all 
possible  WM  states  from  WM:t.  respectively.  Let 
WM  denote  all  legitimate  facts'  for  an  application. 
WM  =  L{WM,jWM~  E  RiWM,,)}. 

Definition  12.  Given  two  interpretations  J  and  £r  £  is  an 
extension  of  £,.  denoted  as  £.  Z  £,.  if  the  domain  and  assign¬ 
ments  in  £  are  retained  in  £.. 

Definition  13.  Let  £■■  be  a  model  for  WM.6  A  KB  is 
inconsistent  if  and  only  if  -*3£  Z  £  A  =fc  RB]. 

During  problem  solving  process,  inconsistent  rules  in  RB 
allow  derivations  of  conflicting  (complementary,  mutual 
exclusive  and  incompatible)  outcomes  from  the  same, 
synonymous  or  consistent  conditions,  thus,  seriously 
compromising  the  reliability  and  correctness  of  knowl¬ 
edge-based  systems. 

3.2.  Classification  of  inconsistency 

Two  types  of  inconsistency  are  classified  m  Table  3.  Each 
type  consists  of  a  set  of  patterns  and  each  pattern  encom¬ 
passes  different  cases.  Type  I  contains  anomalous  situations 
where  rules  with  the  same  or  synonymous  conditions  result 
in  conflict  (complementary,  mutual  exclusive  and  incompa¬ 
tible)  conclusions.  Type  II  captures  the  scenarios  where  a 
chain  of  deduction  involves  a  condition  and  a  conclusion  tat 
two  ends  of  the  chain)  which  are  either  complementary,  or 
mutual  exclusive,  or  incompatible.  It  is  very  important  to 
recognize  the  types  of  inconsistency  for  several  reasons  ui) 


‘  Fact>  that  satiety  the  validity  constraints  of  the  application  domain. 

If  there  are  validity  constraints  on  tacts  in  \V  M.  then  the  nv\iei>  consid¬ 
ered  are  restricted  to  those  that  satisfy  the  constraint. 
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so  that  effective  detection  algorithms  can  be  developed:  (b) 
the  completeness  of  the  V&V  tools  can  be  measured. 

The  exhaustive  nature  of  the  classification  can  be  consid¬ 
ered  by  enumerating  all  cases  that  result  in  an  unsatisfiable 
RB  (Definition  13).  The  clue  is  the  derivation  of  conflict 
literals  by  a  RB  or  a  derived  literal  being  in  conflict  with  a 
fact  in  WM.  Due  to  space  limit,  we  will  skip  a  formal  proof. 

3.3.  Analysis 

Given  a  RB  and  a  WM  containing  a  set  of  rules  and  a  set 
of  facts,  respectively.  we  can  show  that  the  KB  is  consistent 
by  trying  to  find  a  model  for  it.  The  way  we  try  to  find  a 
model  for  the  KB  is  through  considering  an  arbitrarv  inter¬ 
pretation  (.  If  4  satisfies  the  KB  (i.e.  V  satisfies  RB  and 
WM).  then  c  is  a  model  for  it:  otherwise,  there  is  no 
model  for  the  KB.  If  a  model  is  found,  then  the  KB  is 
consistent:  otherwise,  it  is  inconsistent.  We  show  the  analy¬ 
sis  through  some  examples. 

Example  2.  Given  a  KB  consisting  of  a 
RB  =  {r,.r:. n.r4.r?}  and  a  WM  =  {f| . T. f; }  shown  below 

r,  :  P  A  Q  —  A  f,  :  P 

r:  :  R  a  Q  —  B  f;  :  Q 

r ,  :  A  A  B  —  W  f,  :  R 

r,  :  A  —  D 

r,  :  B  —  -D 

we  can  show  that  there  is  no  model  for  the  KB.  thus,  it  is 
inconsistent. 

Proof.  We  convert  the  KB  into  the  set  below 
f),  =  {->P  ->QA.->R  -QB.-A-BW. 

-’AD.  -'B  -D.P.Q.R} 

Let  C  be  any  interpretation  for  O,. 

•  If  C  is  a  model  for  fi,.  then  P.  K  Q.  and  -•  R; 

•  According  to  the  first  two  elements  in  ft,,  there  must  be 
K'  A  and  f=^  B: 

•  Since  J=jAand  1=;  B,  there  must  be  D  and  -»Din 
order  for  ~ 'AD  and  *^B  “,D  to  be  true.  But  this  is 
impossible.  As  a  result,  one  of  the  rules  of  ~AD  and 
->B  -’D  must  be  false  under  £ . 

•  Since  £  cannot  satisfy  all  rules  in  fl|.  it  is  not  a  model  for 
r>:.  Because  £  is  an  arbitrary  interpretation,  there  is  no 
model  for  ft j.  Thus,  the  given  KB  is  inconsistent.  □ 

The  inconsistency  in  Example  2  is  of  type  M3  because  r4 
and  u  have  different  but  consistent  LHS  and  result  in 
conflicting  conclusions  D  and  “,D.  The  proof  procedure 


can  be  automated  using  the  resolution  principle  where  the 
derivation  of  an  empty  clause  amounts  to  the  failure  of 
finding  a  model  (or  the  presence  of  inconsistency  in  the 
KB).  In  practice,  we  can  use  the  structure  of  the  derivation 
generated  by  the  resolution  principle  to  extract  a  set  of 
inconsistent  rules. 

The  above  example  demonstrates  an  inconsistency  in  the 
current  state  of  a  KB.  There  is.  however,  another  scenario  in 
which  the  proof  procedure  yields  a  model  for  a  KB.  but  there 
exists  the  potential  of  inconsistency  in  a  possible  future  state 
of  the  KB.  Consider  the  situation  where  fact  t\  is  a  legitimate 
input  but  is  not  present  in  the  WM  at  the  time  of  checking, 
the  proof  procedure  will  find  a  model  for  (KB  -  fo  and 
conclude  that  it  is  consistent.  (This  coincides  with  the  intui¬ 
tive  explanation  that  the  conflicting  conclusion  --D  is  not 
deducible  because  the  LHS  of  r:  cannot  be  satisfied"). 
However,  inconsistency  arises  when  fact  f;  is  asserted  into 
WM.  This  phenomenon  confirms  our  early  arguments  that: 

•  The  cause  of  inconsistency  stems  from  rules,  but  facts 
will  help  expose  the  inconsistency.  Thus  the  inconsis¬ 
tency  checking  should  involve  both  RB  and  WM. 

•  KB  consistency  can  be  either  temporary  or  persistent.  For 
instance.  KB  —  L  is  temporarily  consistent  until  T  is 
asserted.  Siich  a  transient  consistency  is  not  a  reliable 
indicator.  What  is  needed  is  an  ultimate  consistency 
that  guarantees  that  a  KB  will  be  consistent  for  all  possi¬ 
ble  states. 

•  The  set  of  all  legitimate  facts  in  an  application  domain 
usually  changes  with  time.  Given  a  time  period,  it  is 
important  to  identify  the  set  of  all  legitimate  facts  during 
the  period  in  order  to  conclude  whether  a  KB  will  be 
persistently  consistent  during  the  period. 

Operationally,  when  a  pair  of  conflicting  conclusions  is 
derived,  it  amounts  to  a  fact  retraction  in  WM.  In  a  rule- 
based  programming  language,  there  are  two  types  of  fact 
retraction:  explicit  one  through  a  language  construct  such  as 
retract  and  implicit  one  through  derivation  of  a  negated  fact 
and  negation  as  absence  rule  for  WM.  The  implicit  fact 
retraction  would  be  an  indicator  for  RB  inconsistency,  but 
it  is  not  a  necessary  condition  for  RB  inconsistency.  The 
reason  is  that  in  general,  a  rule-based  system  may  not  have 
the  Church- Rosser  property/  therefore  the  derived  facts  by 
RB  for  the  same  initial  facts  in  WM  may  not  be  unique.  For 
instance,  when  both  r_  and  r5  are  enabled,  depending  on  the 
conflict  resolution  strategy  used  by  the  control  component 
ot  the  system,  r^  and  n  can  be  fired  in  different  order.  As  a 
result,  different  sets  of  output  (derived  facts)  will  be 
produced. 


It  t;  is  not  Li  legal  input,  then  rule  r:  can  never  be  enabled  because  of  the 
unsatisfiability  of  its  LHS.  As  a  result.  the  rule  will  be  picked  up  by  the 
incompleteness  checking  and  classined  as  an  incomplete  case. 

The  Church- Rovser  property  of  a  rule-based  system  refer>  to  the  fact 
that  the  order  in  which  rules  are  fired  does  not  affect  the  final  values 
produced  [5 1  j. 
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Example  3.  Given  a  KB  containing  the  following  rules 
and  facts 

ri  :  P  A  Q  —►  R  fj  :p 
r:  :  R  — *  W  f :  :  Q 

u  :  W  —  A 
r-  :  A - -P 

we  can  show  that  there  is  no  model  for  the  KB.  thus,  the  KB 
is  inconsistent. 


Proof.  We  convert  the  KB  into  the  set 
-0-:  =  {  -P  -  QR.  -R\\\  -WA.  -  A  -  P.  P,  Q} 

Let  C  be  any  interpretation  for  fk 

•  If  4  is  a  model  for  fk  then  P  and  k  Q; 

•  There  must  be  ^-A.  f=  -Wand  ^  :-R.  respectively  in 
order  for  -A-P.  -\VA.  and  -RW  to  be  true  under’*:: 

•  However,  there  must  be  f=  -  R  according  to  the  first 
element  in  fk  R  and  “,R  cannot  be  both  true  under  4. 
As  a  result,  one  of  the  clauses  of  --P  -»QR  and  ^RW 
must  be  false  under  4 . 

•  Since  l  cannot  satisfy  all  rules  in  fk  it  is  not  a  model  for 
fk  Because  i  is  an  arbitrary  interpretation,  there  is  no 
model  for  flv  Thus,  the  given  KB  is  inconsistent.  T 

The  inconsistency  in  Example  3  is  of  type  II- 1  because  r; 
has  a  condition  P  and  results  in  the  derivation  of  -«P.  Type  Ii 
inconsistency  not  only  introduces  the  logical  contradiction 
into  the  inference  process,  it  also  has  other  pragmatic 
ramifications: 

•  In  Example  3.  the  inconsistency  involves  a  pair  of 
complementary  literals.  When  r^  is  fired,  it  causes  P  to 
be  removed  from  \VM.  thus  either  preventing  those  rules 
that  rely  on  P  as  input  from  being  enabled  or  deactivating 
those  rules  that  are  enabled  as  a  result  of  P. 

•  A  list  of  synonymous  literals  and  a  list  of  mutual  exclu¬ 
sive  literals  must  be  declared  and  maintained  as  a  KB  is 
being  built  and  modified.  In  addition  to  Definition  6.  the 
following  should  be  used  to  maintain  the  validity  of  WM: 

If  (P  ~  Q)  a  (Q  E  WM).  then  KB  \~  P  would  result  in 
(W'M  -  {Q})  U  {P}. 

If  (P  ^  Q)  A  (Q  E  \\  M).  then  KB  P  P  would  result  in 
(WM  -  {Q}}. 

•  Computationally,  when  -P  is  a  derived  fact,  the  infer¬ 
ence  engine  will  check  not  only  for  the  presence  of  P  in 
WM.  but  also  the  presence  of  some  literal  synonymous  to 
P.  Alternatively,  before  a  derived  fact  P  gets  deposited 
into  W’M.  the  inference  system  also  need  to  check  for  the 
presence  of  Q  in  WrM  that  is  mutually  exclusive  to  P. 


Definition  6  now  needs  to  be  modified  to  reflect  the  impact  of  s'-r.on- 
> mous  literal <  on  the  occurrence  of  -P  in  LHS  or  RHS  of  a  rule. 


Though  the  use  of  synonymous  and  mutual  exclusive 
literals  may  aid  the  expressive  power  of  the  lansuase. 
their  potential  complications  in  system  correctness  should 
never  be  underestimated  and  their  computational  cost 
should  not  be  ignored.  Therefore,  the  use  of  those  literals, 
especially  synonymous  literals,  should  be  judicious. 


4.  KB  redundancy 

4.1.  Definition  of  redundancy 

Though  redundancy  may  not  cause  logical  problems  (i.e. 
with  no  effect  on  the  set  of  deducible  literals),  it  may  lead  to 
following  situations  where  potential  problems  may  arise: 

•  During  KB  maintenance  or  evolution,  if  one  of  the  redun¬ 
dant  rules  is  modified  and  the  others  remain  unchanged, 
then  the  updated  KB  will  not  correspond  to  the  intended 
change,  and  inconsistencies  can  be  introduced  as  well: 

•  For  a  KB  where  no  certainty  factors  are  utilized,  redun¬ 
dant  rules  may  be  enabled  under  a  given  state,  thus 
resulting  in  performance  slow  down  because  all  the 
enabled  redundant  rules  may  be  fired,  even  though  the 
firings  of  those  redundant  rules  will  yield  the  same  set  of 
literals  (conclusions): 

•  For  a  KB  containing  certainty  factors,  redundancy  will 
become  a  serious  problem,  the  reason  being  that  each 
redundant  rule  may  be  fired,  resulting  in  multiple  count¬ 
ings  of  the  same  information,  which,  in  turn,  erroneously 
increases  the  level  of  confidence  assigned  to  the  derived 
literals  (conclusions).  This  may  ultimately  impact  the  set 
of  deducible  literals. 

It  redundancy  is  introduced  by  design  to  speed  up  some 
classes  of  frequent  deductions,  then  it  is  usually  confined  to 
a  subset  of  the  cases  (e.g.  types  1-2. 1-3. 1-5  in  Table  4).  W'e 
can  always  isolate  those  ‘'useful**  redundant  rules,  and  weed 
out  redundancy  from  the  KB  where  there  is  supposed  to  be 
none. 


Definition  14.  For  a  set  S  of  rules,  we  define  a  function  ip 
which  returns  the  number  of  distinct  literals  in  S.  If  both 
L  and  HL  are  in  S.  they  will  be  counted  as  two  different 
literals. 


Definition  15.  Given  a  set  S  of  rules,  if  we  can  construct  a 
set  S'  of  rules  such  that  S  ***  S'  and 

(a)  either  S  =  S  —  A.  where  A  #  0  and  ACS: 

(b)  or  S  =  <£(S).  where  6  is  a  transformation  on  S  such 
that  jS  j  =  ;S|  and  iMS')  <  tpiS):  then  there  is  redundancy 
in  S. 
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Table  4 

Types  of  redundancy 


Type 

Description 

Pattern 

M 

1-2 

Rules  having  the  same  conclusion  but  different 
permutations  of  the  same  set  of  conditions 

A  rule  r.  which  can  be  deduced  from  a  set  of  rules 

(RHSfr, )  =  RHSfr.  >>  a  (LHSfr.)  6  p(L)>  A  (LHSfr.)  G  p(L)). 
where  L  is  a  set  of  literals 

{r . r  }  .u  rA.  where  (RHSfr,)  =  LHSf...))  a  ...  a  fRHS(...)  = 

1-3 

A  rule  r  which  is  a  specialization  of  another  rule  r* 

LHSir,))  a  (LHSir  i  =  LHSir, ))  a  (RHSlr.)  =  RHStr;» 

( LHSir, )  <  LHSir.  >>  a  (RHSlr.)  <  RHS(r.)i.  where  LHS(r,)  and 

1-4 

1-5 

A  rule  r.  which  is  subsumed  by  another  rule 

Generalized  subsumed  rule  (n  is  subsumed  by  r,  and  r.) 

RHSlr,)  are  specializations  based  on  the  same  set  of  substitutions 
(LHSir,)  C  LHSir.  n  a  (RHSlr, )  =  RHS(r;)) 

(RHSlr.)  C  LHSir  n  a  (RHSlr.)  =  RHS(r; ))  A  (LHSir.)  = 

1-6 

Rules  with  same  condition(s)  and  synonymous 
conclusions 

(LHSir.)  U  LHSir. .  -  RHSlr, ») 

(RHSlr,)  =  RHSir.  n  a  (LHSir.)  =  LHSir.)) 

1-7 

Rules  with  synonymous  conditions  and  same 
conclusion 

(RHSlr. )  =  RHSlr;  »  a  (LHS(r, )  =  LHS(r< » 

1-8 

Rules  with  synonymous  conditions  and  synonymous 
conclusion 

(RHSir,)  =  RHSlr.  i >  a  (LHSir  )  =  LHSir.)) 

II- 1 

Two  rules  which  have  the  same  or  synonymous 
conclusion  but  contain  pairfs)  of  conflict  literal*  in 
their  conditions 

.  K  RHSlr.  i  =  RHSir  »  v  (RHSir.)  as  RHS(r,»)  A  (LHSir,)  = 

L  U  (Pj)  a  (LHSir  i  =  L  'J  {Q}).  where  L  is  set  of  literals  and 
p Q 

II- 2 

A  rule  with  redundant  condition  s) 

(P  £  LHS(r  ))  A  (P  €  LHSir.))  A 

II-3 

Two  rule<  sharing  the  same  conclusion,  and  one  rule 
having  a  singleton  condition  that  is  in  conflict  with  a 
condition  of  another  rule 

UP  =  P  )  v  (P  s  p  i  v  (P<P'ii 

(RHSlr )  =  RHS(r. n  A  (LHSir.)  —  L  U  {P})  a  (LHSir,)  = 

( Qj ).  where  L  is  set  of  literals  and  P  *±  Q 

4.2.  Commonly  found  types  of  redundancy 

If  either  of  the  conditions  in  Definition  15  holds  for  a 
given  RB.  then  the  RB  is  said  to  contain  redundancy. 
Thus,  in  essence,  all  types  of  redundancy  are  captured  by 
Definition  15.  However,  in  practice,  there  are  sets  of 
commonly  found  types  of  redundancy.  What  are  included 
in  Table  4  are  the  frequently  encountered  types  of  redun¬ 
dancy.  Type  I  redundancy  in  Table  4  involves  redundant 
rule(s)  and  Type  II  involves  redundant  (or  unnecessary) 
literal(s).  Each  type  encompasses  a  set  of  specific  cases. 

4.3.  Analysis 

Given  a  set  S  of  rules.  S  h  C  indicates  the  set  C  of 
conclusions  derivable  from  S.  If  we  can  construct  a  set  S' 
of  rules  from  S  such  that  Property  (a)  in  Definition  15  is 
satisfied,  we  further  divide  C  into  C'  and  C  where  S'hC' 
and  A  h  C".  We  can  prove  that  if  S'  *=  S.  then  S'  ^  A. 
According  to  Theorem  1.  for  every  rule  P  E  A,  S'  — ►  P  is 
valid,  thus  C  C  C  and  C  =  C  .  Therefore,  rules  in  A  are 
redundant.  During  the  analysis  process  we  can  select  a 
model  4  for  S'  with  regard  to  the  enabling  facts  and  obtain 
C'  from  S',  and  then  obtain  C"  from  A  to  show  C"  C  C'. 

When  S'  is  constructed  with  Property  (b)  of  Definition  15. 
the  number  of  literals  in  S'  is  reduced,  even  though  the 
number  of  rules  remain  the  same.  Similar  analysis  can  be 
carried  out  to  prove  that  C  =  C  .  Since  S'  either  contains 
fewer  rules  or  has  fewer  literals,  we  can  use  S'  to  replace  S. 
Examples  4  and  5  are  used  to  demonstrate  the  analysis 
process. 


Example  4.  Given  the  following  set  S  of  rules 

r:  PaQ-R 

r::  AaB-U 

n:  U  A  V  — *  W 

r_:  R  A  W  — *  D 

r5:  PaQaAaBaV— »D 

Let  S'  =  S  -  { r5 } .  We  can  show  that  S'  =  S  and  u  is 
redundant. 


Proof.  We  first  convert  S  and  S'  into  the  abbreviated 
format: 

S  =  {-.p  -QR.  ->A  -BU,  -U  -VW, 

-R  -WD.  -P -Q  -■A  — B  -VD} 

S'  =  HP  -QR.  -  A  -BU,  -U -VW,  -R  -WD} 

Let  4  be  an  interpretation.  Two  situations  need  to  be 
considered: 

1.  If  r=Y  S.  then  r=:  S'  is  obvious.  This  is  a  trivial  case. 

2.  If  f=,'  S',  we  need  to  show  that  S  also  holds.  This  boils 
down  to  proving  that  i=t-  v$.  Since  =;  r4  in  S',  we  must 
have  !=;  D  or  f-  -R  or  N-  -W 

Case  1:  If  r=:  D.  then  K-  iy. 

Case  2:  If  Ffr-  -R,  then  -P  or  r=^  — Q  because 
K-r,.  Hence.  u: 
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Case  3:  If  l=-  -W,  then  t=t-  -U  or  t=t-  -V  because 

F-  T\ 


if  K-  ->V,  then  f=  t-  r5. 

'1 '*<  ■’U,  then  either  -A  or  -B  because  ^.r, 
Thus,  :  n.  '  -  * 


Therefore,  if  S  is  satisfied  under  so  is  S  S7  =  S 
If  we  choose  a  model  ff)  for  S7  in  which  f=;0  {P  Q  A  B 
V}.  £„  is  also  a  model  for  r3.  The  set  of  derivable  facts  from 
S  and  r5  are  C  =  {R.U.W.D}  and  C"  =  {D>.  respec- 
tively.  Obviously.  C  C  C'.  therefore  r5  is  redundant.  □ 


S  is  of  redundancy  type  of  1-5.  Removing  r,  will  eliminate 
the  redundancy. 


Example  5.  Given  the  following  set  S  of  rules 

r::  PaQaW-R 

R:  ~ 1 Q  — >  R 


Let  cb be  a  transformation  that  results  in  a  rule  r,'  bv 
eliminating  the  literal  Q  from  r,.  and  let  S'  =  {r, '.  r,}  We 

can  show  that  S'  -  S  and  the  literal  Q  is  redundant  (or 
unnecessary). 


Proof.  We  first  convert  S  and  S'  into  the  format  below: 

S  =  {  ’ P  “’Q  ~ '  WR,  QR},  S'  =  {-P-,WR.QR} 

Let  c  be  an  interpretation.  Two  cases  need  to  be  con¬ 
sidered: 

1-  If  K-  S',  then  hy  S  is  trivial. 

2.  If  K-  S.  we  need  to  show  that  \={  S'  also  holds.  This  boils 
down  to  proving  that  whenever  S  is  satisfied  by  C .  (=-  r, 
Since  \=i  r2  in  S.  we  must  have  Q  or  l=.  R*  ' 

Case  1:  If  ^  R.  then  t=.  r,': 

Case  2:  If  K-  Q  and  R. 10  then  K  -P  or  t=.  -,\V 
must  be  true  because  r,.  Hence.  ^  r,'.  ' 

Therefore.  S'  =  S.  the  literal  Q  in  r,  is  redundant.  □ 


S  is  of  redundancy  type  of  II-3.  Correcting  Type  II  redun¬ 
dancy  involves  removing  the  literal(s)  in  question.  For 
instance,  for  Type  II-3.  when  RB  contains  a  rule  set  S 
matching  the  pattern,  it  can  be  replaced  by  the 


R  indicates  that  R  evaluates  to  false  under  C . 
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corresponding  rule  set  S'  as  shown  below; 
S:  r  | :  PI  a  •  •  •  aPA'AQ— *R  k  >  1 

r2:  ->Q  — ►  R 

S':  r,:  PI  a  •••  a  PA-  — R 
r<  -Q  —  R 


5.  KB  circularity 

5.1.  Definition  of  circularity 

Circularity  in  a  KB  has  been  informallv  defined  as  a  set  of 
rules  forming  a  cycle  [7.24.30J.  What  exactly  a  circularity 
entails  semantically  is  not  that  clear  in  the  literature.  In  this 
section,  we  provide  a  definition  of  the  KB  circularity  in 
terms  of  the  derivation  of  tautologous  rules  and  argue  that 
the  phenomenon  reflects  an  anomalous  situation  in  a  KB  and 
has  both  operational  and  semantic  ramifications. 


Definition  16.  A  rule  E  is  tautologous.  denoted  as  E.  if  it 
contains  a  complementary  or  an  incompatible  pair  of 
literals. 


Example  6.  Following  are  two  tautologous  rules: 

^  ^  ^ here  - *P  and  P  are  a  complementarv  pair 

(in  ->P  v  ->Q  vP)  ' 

•  high_priced(x )  a  spacious(.x)  —  expensive(x),  where 
^  high_priced(x)  and  expensive(.x)  are  an  incompatible 
pair  (in  ~'high_priced(x)  V  -•spacious{x)  V  expensive(x)). 


Definition  17.  A  nonempty  set  S  of  rules  is  circular  if  we 
can  deduce  a  tautologous  rule  from  S. 


Definition  18.  A  nonempty  set  S  of  rules  is  minimally 
circular,  denoted  as  S,  if  S  is  circular  and  no  proper  subset 
of  S  is  circular. 


Given  S.  rules  in  S  are  said  to  be  forming  a  cycle.  The 
deduction  of  a  tautologous  rule  is  trivial  if  S  is  a  singleton 
set  satisfying  the  aforementioned  condition.  In  a  given  S. 
there  may  be  more  than  one  tautologous  rule  deducible  from 
it  that  involves  different  pairs  of  (complementary  or  incom¬ 
patible)  literals. 

Operationally  speaking,  circular  rules  may  result  in 
infinite  loops  (if  an  exiting  condition  is  not  properly  defined) 
during  inference,  thus  hampering  the  problem  solving 
process.  Semantically  speaking,  the  fact  that  a  tautologous 
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wff  is  derivable  indicates  that  the  circular  rule  set  encom¬ 
passes  knowledge  that  is  always  true  regardless  of  any 
problem  specific  information.  In  general,  tautologous  wffs 
are  those  that  are  true  by  virtue  of  their  logical  form  and  thus 
provide  no  useful  information  about  the  domain  being 
described  [47],  Therefore,  circular  rules  prove  to  be  less 
useful  in  the  problem  solving  process.  What  is  needed,  as 
evidenced  in  many  real  KB  systems,  are  consistent  rules  that 
are  triggered  by  problem  specific  information  (facts)  rather 
than  tautologous  rules  that  are  true  regardless  of  the  problem 
to  be  solved. 

5.2.  Types  of  circularity 

Circularity  primarily  stems  from  the  definitions  of  rules  in 
RB.  However,  control  strategies  deployed  (in  places  such  as 
the  mechanisms  of  agendas,  rule  salience  or  priority  level 
definitions  and  module  selections)  in  the  inference  system 
may  also  be  cause  for  the  infinite  looping  of  certain  rules.  In 
this  paper,  we  focus  on  the  types  of  circularity  that  are 
confined  in  the  RB. 


r •:  R  A  B  — ►  Q 

u:  UaDaEaG— *P 

Using  the  resolution  method,  we  can  derive  a  tautologous 
rule  from  S.  Since  S  is  the  smallest  set  that  yields  such  a 
tautologous  rule,  it  is  thus  minimally  circular. 

Proof.  We  convert  S  into  the  following  format 
S  =  {-WU.-P-AR.-Q-CW.-R-BQ. 

-UHD-E-GP}. 

It  is  not  difficult  to  see  that  the  following  rule  is  derivable 
from  S  by  using  the  resolution  method 

-AV  ->D  -E  -G  -‘A  -B  -CW. 

Since  -AV  and  W  are  a  pair  of  complementary  literals,  the 
derived  rule  is  tautologous.  Therefore,  S  is  minimally 
circular.  □ 


Definition  19.  Given  a  minimally  circular  rule  set  S.  we 
define  two  sets  of  literals  SL  and  Sk  as  follows: 

SL  =  {LL  E  LHS(r)  A  r  G  S} 

SR  =  {L;L  E  RHS(r)  A  r  E  S}. 


The  types  of  circularity  in  a  rule  base,  as  summarized 
in  Table  5.  are  classified  based  on  enumerating  possible 
relationships  between  SL  and  SR  and  the  nature  of  the 
tautology.  Type  I  circularity  indicates  cycles  in  which 
Sl  =  SR.  Type  II  describes  cycles  with  additional  condi¬ 
tions  invoked  in  the  rules,  therefore.  SR  is  a  proper 
subset  of  SL.  If  Cs  is  a  cycle  formed  out  of  a  minimally 
circular  rule  set  S.  the  girth  g  of  Cs  can  be  defined  as 
gtC$)  =  Bj.  Cycles  in  these  types  can  have  a  girth 
ranging  from  one  to  some  integer  MAX  where  MAX 
is  bounded  by  the  cardinality  of  the  rule  base  |RB|  of  a 
given  KB. 

5.3.  Analysis 

The  analysis  of  KB  circularity  amounts  to  deriving 
from  a  given  rule  base  a  tautologous  rule  r  that  satisfies 
the  conditions  in  Definition  16,  using  some  inference 
method. 


Example  7.  Below  is  a  rule  base  S  containing  five  rules 

r,:  W-U 

r::  PaA-R 

ry  Q  A  C  —  W 


Incidentally,  there  are  four  other  tautologous  rules 
involving  -*P  and  P.  -»Q  and  Q.  ->R  and  R.  and  -»U 
and  U.  respectively.  This  example  exhibits  Type  II-l  circu¬ 
larity. 

Once  a  circularity  is  detected,  the  circular  rule  set  needs 
to  be  syntactically  redefined  to  break  up  the  circularity. 
Semantically,  information  about  a  problem  domain  needs 
to  be  reorganized  so  that  it  will  contribute  to  the  problem 
solving  process.  Some  of  the  possible  remedial  measures  for 
circularity  can  be  found  in  Section  7. 

6.  KB  incompleteness 

Informally  speaking,  a  KB  is  incomplete  when  it  does  not 
have  all  the  necessary  information  to  answer  a  question  of 
interest  in  an  intended  application  [16.31].  Thus,  complete¬ 
ness  represents  a  query-centric  measure  for  the  quality  of  a 
KB.  KB  incompleteness  is  a  real  issue  to  be  reckoned  with 
for  at  least  the  following  reasons:  (a)  In  many  applications, 
the  KB  is  built  in  an  incremental  and  piecemeal  fashion  and 
it  undergoes  a  continual  evolution.  The  information 
acquired  at  each  stage  of  the  evolution  may  be  vague  or 
indefinite  in  nature,  (b)  The  deployment  of  a  KB  system 
cannot  just  wait  for  the  KB  to  be  stabilized  in  some  final 
and  complete  form  since  this  may  never  happen. 

Despite  the  fact  that  a  practical  KB  can  never 
completely  capture  all  aspects  of  a  real  problem 
domain,  it  is  still  possible  for  a  KB  to  be  complete 
for  a  specific  area  in  the  domain.  The  boundaries  of 
this  specific  area  may  be  defined  in  terms  of  all  relevant 
queries  to  be  asked  during  problem  solving  process.  If  a 
KB  has  all  the  information  to  answer  those  relevant 
queries  definitely .  then  the  KB  is  complete  with  regard 
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(1999) 


SL  Sr  for  S  and  taurologous  rule  involves 
complementary  pair  of  literals 

s,.  =  SR  for  S  and  (autologous  rule  involve,  pair  of 
incompatible  literals  K 

Sk  c  Sl  for  S  and  (autologous  rule  involves 
complementary  pair  of  literals 

Sr  C  Sl  for  S  and  (autologous  rule  involves  oam  of 
incompatible  literals  '  h  ‘ 


Pattern 

<SL  =Sr  I  a  (Sr  E>  a  (L.  -L  €  E)  a  (L#-L> 
iSL  =  Ssi  A  (S  -  Ei  a  iL.  -L  S  Ei  a  (L  =  -L) 
iSs  C  S.  i  a  (S  i-  E)  a  (L.  ->L  6  E)  a  tL#  -’L) 
<Sr  C  So  a  (S  H  Ei  a  (L.  -L  e  El  A  (L  a  -Li 


to  tnose  queries.  In  what  follows,  we  base  oi 
discussions  of  completeness  on  the  concepts  of  relevar 
quenes  and  the  ability  of  a  KB  to  answer  those  queries 

6.1.  Definition  of  quay-based  incompleteness 

Definition  20.  Given  a  KB.  we  define  PK3  and  as  s-t 
of  aH  pred'cate  symbols  and  askable  predicate  svmbols  h 
he  KB.  respectively.  An  askable  predicate  symbol  is  one 
that  ctm  appear  in  a  query.  Usually  it  is  the  case  thai 

-  K8  _  -  a  _  A  query  Q  containing  predicate  symbol. 
Pi . Pi  ^  tr-A  is  denoted  as 

Q~Qip, . p,)i: 


Definition  21. 

as  follows: 


A  set  Q  of  relevant  queries  is  now  defined 


Q  -  {QQ  appears  in  some  query  session  a 


2  . /»/)  A/>, . P,  G  ?A). 


Definition  22  Given  a  query  Q  e  Q.  the  answer  to  6 
denoted  as  cAQ).  can  beeither  definite  or  unknown.  a{% 

h  is 


Definition  23.  A  KB  is  complete  with  regard  to  a  relevam 
query  set  Q  if  V£>  G  Q  [a{Q)  is  definite]. 


6.2.  Types  of  incompleteness 


Let 


"kb  LJ  Pa.  For  a  predicate  symbol  p  E  ?,  we 


^hen  there  is  incompleteness  in  a  KB.  this 
evidenced  in  Table  6. 

’-  We  assume  that  the  query  Q  is  a  conjunction  of 
predicate  symbols  P . p.. 


may  not  be  true,  as 
the  literals  containing 


introduce  a  set  of  predicate  symbols 
directly  or  indirectly  depends,  'fi(p)  can 
the  following  procedure. 


$</>)  on  which  p 
be  obtained  using 


INPUT:  p  G  F 
OUTPUT:  <R(p) 
*(/»>  :=  0; 
while  3r  G  KB 
LHS(r): 


[/’  E  RHS(r)]  do  $(/>)  ■=  'X(p)\j 


“'t  ,  3,r,?“  t</eRHs,„A,e 

•  (/’>  A  LHS(r)  £  ?\(p)]  do  '}\(p)  :=  'fi(p)  u  LHS(r); 


,  ,  .  ,  . '*•  “  symooi  p  cannot  be 

satisfied  by  either  a  given  fact  or  a  derived  fact,  then  it  is 

•  eT°u?  £  P'  Three  types  of  inc°mpleteness  are  defined 
in  Table  6.  Types  I  and  II  reveal  KB  incompleteness  from 
the  perspective  of  relevant  queries,  i.e..  lack  of  necessary 
information  to  answer  queries,  and  Type  III  indicates  the 
potential  incompleteness  of  the  relevant  query  set  Q  from 
the  perspective  of  known  information  (rules/facts) 

Though  the  classification  in  Table  6  is  exhaustive  with 
regard  to  Definition  23.  there  are  pragmatic  and  application 
specific  considerations  that  will  help  determine  the  validity 
or  incompleteness  cases. 


6.3.  Analysis 

The  analysis  of  KB  incompleteness  depends  critically  on 
the  availability  of  information  regarding  the  relevant  query- 
set  in  a  problem  domain.  Prototyping  often  serves  as  a 
means  to  ascertain  the  relevant  query  set.  If  the  relevant 
query  set  is  available,  the  analysis  amounts  to  finding  out 
it  all  queries  can  be  answered  definitely.  Checking  for  the 
presence  or  absence  of  the  aforementioned  syntactic  symp¬ 
toms  is  an  integral  and  necessary  part  of  the  analysis 
process.  However,  there  are  other  considerations  in  the 
analysis  process  that  are  semantic,  pragmatic,  or  problem 
specific.  The  analysis  process  is  really  an  iterative  one. 

because  as  KB  continually  evolves,  so  will  the  relevant 
query  set. 
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Table  6 


Types  of  incompleteness 


Type 

Descriptions  [7.24,37] 

Pattern 

I 

Dangling  conditions. 

3q  €  P  3p  E  ?A[q  e  'K{p)  a 

unreachable  conclusions 

1 rqY 

II 

Missing  initial  facts. 

3p  E  PA  [*(/»  =  0  a 

missing  rules 

P  £  PkbI 

III 

Useless  conclusions, 
unused  initial  facts, 
isolated  rules 

£  Pkb  Vp  £  [<7  £  ^K(p )] 

J  Because  the  criterion  for  the  completeness  issue  is  domain-specific,  it  is 
possible  that  q  in  [q  S  '}\(p)A  If  4]  may  be  useless  structure  in  the  KB. 
Ultimately,  the  domain  expert  or  knowledge  engineer  has  to  determine 
the  nature  of  the  anomaly. 

Example  8.  For  the  following  KB, 
r,  :  /?(.v,  y)  A  r(y,  z)  —  px  (*,  z)  f ,  :  m(d) 

r2  :  n *(v)  A  u(x)  — *■  r(x.y)  f2  :  v(a) 

r3  :  v(A')  — ►  \v(x)  fy  :  u(  b) 

r4  :  m(x)  —  p3(x)  f4  :  W(C) 

we  have 

?kb  = 

"Rip  1)  =  {/z,  r,  n.  v,  uj 
*</>:)  =  0. 


Since  p2  E  PA  and  [tf(p2)  =  0  a  /?2  £  PKB],  there 
exists  Type  II  incompleteness.  No  rules  and  facts  could  be 
used  to  answer  queries  involving  p:.  In  addition,  h  E  ) 
and  \f  h.  So  Type  I  incompleteness  also  exists.  Finally,  the 
presence  of  the  rule  r-  and  the  fact  fj  may  indicate  that  px 
should  have  been  an  askable  predicate.  In  other  words,  PA  is 
incomplete,  and  there  is  reason  to  believe  that  the  relevant 
query  set  is  incomplete  also.  □ 

7.  Remedial  measures 

Once  KB  anomalies  are  identified,  the  next  issue  is  how¬ 
to  correct  the  situations  in  which  the  quality  of  a  KB  has 
been  compromised.  Though  it  is  of  pivotal  importance,  the 
issue  has  not  been  adequately  addressed  in  the  literature.  To 
a  certain  extent,  this  is  due  to  the  fact  that  the  issue  of  how?  to 
mend  a  KB  relies  on  a  whole  host  of  considerations,  many  of 
which  are  problem  or  application  specific.  In  the  rest  of  this 
section,  we  would  like  to  address  the  issue  in  terms  of  some 
general  principles  and  provide  some  example  remedial 
measures  for  the  cases  dealt  with  in  the  previous  four 
sections. 

For  correcting  inconsistency,  we  suggest  the  followins 
actions: 


•  Avoid  using  synonymous  literals  if  possible. 

•  Delete  one  of  the  offending  rules  that  derives  the  conflict 
conclusion. 

•  Modify  the  conditions  (e.g.  predicate  symbols)  of  the 
rules  involved  such  that  they  no  longer  have  or  share 
the  same  or  synonymous  conditions. 

•  Modify  the  conclusions  (e.g.  predicate  symbols)  of  the 
rules  involved  such  that  they  are  no  longer  in  conflict. 

•  Move  one  of  the  offending  rules  to  a  different  rule 
module  such  that  the  derivation  of  conflict  conclusions 
cannot  take  place  in  the  same  problem-solving  session  or 
at  the  same  time. 

Actions  to  eliminate  redundancy  may  include: 

•  Delete  redundant  rule(s). 

•  Merge  or  collapse  rules  into  one. 

For  example.  PaQ—R.  -’PaQ— >R=>Q^R 

•  Delete  condition(s)  of  certain  rule(s). 

For  example.  PaQ-^R.  -.Q  —  R=>p^R 
-Q-R 

•  Modify  the  conditions  or  conclusions  of  the  redundant 
rules  such  that  they  no  longer  are  the  same  or  synony¬ 
mous. 

To  resolve  circularity,  the  following  remedial  measures 
may  be  taken: 

•  Remove  a  rule  from  a  circular  rule  set. 

For  example.  P— *Q.  Q  — >  R.  R— p=>p  — >Q 

Q-R 

•  Redefine  a  conclusion  of  a  rule  in  the  set  such  that  it  no 
longer  serves  as  a  condition  of  another  rule  in  the  set. 

For  example,  P— >Q.  Q— R,  R  — ►  P  =>  P  —  Q, 

Q  — • ►  R'.  R  — P  w'here  R7  and  R  are  no  longer  unifiable. 

•  Redefine  a  condition  of  a  rule  in  the  set  such  that  it  no 
longer  matches  a  conclusion  of  another  rule  in  the  set. 

To  plug  holes  in  an  incomplete  KB,  we  could 

•  Add  new  rules  and/or  facts  to  make  all  relevant  queries 
definite. 

For  example,  new  rules  and  facts  can  be  added  to  make 
h{ a\  v)  satisfiable  in  Example  8. 

•  Modify  the  initial  facts  to  patch  up  holes. 

•  Modify  the  conditions  and/or  conclusions  of  rules 
involved  in  an  incompleteness  case  so  that  they  will 
be  ’’connected"  with  the  rest  of  RB. 

Though  it  is  beyond  the  scope  of  this  paper,  we  would  like 
to  point  out  that  in  a  KB  w'here  certainty  factors  (CF)  are 
used,  there  are  additional  actions  to  be  considered.  For 
instance,  add  or  modify  CF  values  for  rules  or  facts,  or 
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modih  the  threshold  value(s)  for  the 

during  inference  process.  Propagation  Acknowledgements 


8.  Concluding  remarks 

As  more  and  more  expert  systems  and  knowledge-based 

loss  of  arS  ?P  >ed  m  S£ttingS  Where  failures  result  in 
oss  of  productivity,  decision-makins  quality  oroDertv 

business  services,  investment,  or  eve;  life,  ways  to  detea 
and  resolve  potential  anomalies  in  a  KB  become  critical 
issues  in  developing  coirect.  accurate  and  reliable  systems 
In  order  for  the  results  to  be  credible.  V&V  techniques  must 
be  built  on  a  solid  theoretical  foundation. 

It  is  difficult  to  assess  many  of  the  V&V  tools,  methods 
and  techniques  that  have  been  developed  or  proposed 
because  there  is  no  accepted  standard  aaainst  which  o 
meagre  the  reliability  or  coirectness  of  an*  expert  svstem 
Indeed  there  is  lack  of  definite  semantics  for  expert  systems 
10  ge,neraI  and  KB  in  particular.  This  prevents  anv  de£e 
conclusions  about  reliability  and  hinders  the  use  of  expert 
systems  in  safety-critical  applications.  The  field  of  V&V? for 
expert  systems  is  far  from  having  tractable  formal  models 
that  can  cover  all  of  the  features  of  real  expert  systems 
which  often  rely  on  imperative  state  changes  and  other 
non-iogical  features.  Our  simplified  model,  though  a  preli- 
minar)  one.  does  provide  a  basis  for  reaching  definite 
conclusions  about  the  reliability  of  those  aspects~in  expert 

tShy!rtefr,t  at  Tfbe  expressed  in  l0-2ical  terms.  It  is  our  hope 
that  the  logical  formulation  presented  in  this  paper  makes  a 
step  in  the  right  direction. 

JTV0"?1  COminUe  'n  severaI  directions.  One  is 

concerned  with  how  to  esrihlkh  ™  „ 

based  on  ,  establish  an  assessment  standard. 

ased  on  logical  instruments  similar  to  those  discussed  in 
his  paper,  for  the  V&V  tools  and  methodologies.  For 

•  d  u'Ven  a  KB  and  its  semant'cs  r,  we  ;se  Ar  to 
mdl^a'e ‘he  set  of  anomalies  defined  under  T.  For  a  V&V 

S22  old"'  "*>  1°  d'n0K  *  *•  »f  an0malies  Mb 
capable  of  discovering.  M  is  soundif  Va  G  A,,  fa  G  A,l-  M 

is  complete  if  Va  G  Ar  [a  G  am].  Al  ]’ M 

Another  direction  is  to  study  the  KB  anomalies  in  an 

,nm  Temed  (°0)  parad,=m'  Recent  developments  in 
knowledge  representation  formalisms  include:  (a;  extending 

he  OO  paradigm  to  include  rules  (i.e.  rules  can  be  consid- 
ered  as  a  specific  type  of  behavior  for  objects);  (b)  bnnmn* 
objects  into  the  rule-based  paradigm  (i.e.  rules  are  specified 
about  objects);  (c)  hybrid  representation  formalism  that 
blends  frames,  objects,  cases  and  rules  together  [48]  The 

concepts  S'  *°  iSSUe  °f  h°W  t0  the 

ngs  ot  ra  anoma,ies  in  the — of 

0"  vr4<f  50i°  Tr d  be  derl0ped  based  0n  itS  underlying  ontol¬ 
ogy  [49.30],  It  is  not  clear  what  relationship  there  is  between 

e  anonymous  situations  that  are  manifested  at  a  KB  level 
and  the  root  causes  at  its  ontology.  This  is  yet  another  direc- 
tion  worth  explorins. 


The  authors  would  like  to  express  their  sincere  apprecia- 
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Abstract 

Machine  learning  deals  with  the  issue  of  how  to  build  programs  that  improve  their 
performance  at  some  task  through  experience.  Machine  learning  algorithms  have  proven 
to  be  of  great  practical  value  in  a  variety  of  application  domains.  They  are  particularly 
useful  for  (a)  poorly  understood  problem  domains  where  little  knowledge  exists  for  the 
humans  to  develop  effective  algorithms;  (b)  domains  where  there  are  large  databases 
containing  valuable  implicit  regularities  to  be  discovered;  or  (c)  domains  where  programs 
must  adapt  to  changing  conditions.  Not  surprisingly,  the  field  of  software  engineering 
turns  out  to  be  a  fertile  ground  where  many  software  development  tasks  could  be 
formulated  as  learning  problems  and  approached  in  terms  of  learning  algorithms.  In  this 
paper,  we  first  take  a  look  at  the  characteristics  and  applicability  of  some  frequently 
utilized  machine  learning  algorithms.  We  then  provide  formulations  of  some  software 
development  tasks  using  learning  algorithms.  Finally,  a  brief  summary  is  given  of  the 
existing  work. 
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1.  The  Challenge 

The  challenge  of  modeling  software  system  structures  in  a  fastly  moving  scenario  gives 
rise  to  a  number  of  demanding  situations.  First  situation  is  where  software  systems  must 
dynamically  adapt  to  changing  conditions.  The  second  one  is  where  the  domains  involved 
may  be  poorly  understood.  And  the  last  but  not  the  least  is  one  where  there  may  be  no 
knowledge  (though  there  may  be  raw  data  available)  to  develop  effective  algorithmic 
solutions. 

To  answer  the  challenge,  a  number  of  approaches  can  be  utilized  [1,12].  One  such 
approach  is  the  transformational  programming.  Under  the  transformational  programming, 
software  is  developed,  modified,  and  maintained  at  specification  level,  and  then 
automatically  transformed  into  production-quality  software  through  automatic  program 
synthesis  [5],  This  software  development  paradigm  will  enable  software  engineering  to 
become  the  discipline  of  capturing  and  automating  currently  undocumented  domain  and 
design  knowledge  [10].  Software  engineers  will  deliver  knowledge-based  application 
generators  rather  than  unmodifiable  application  programs. 

In  order  to  realize  its  full  potential,  there  are  tools  and  methodologies  needed  for  the 
various  tasks  inherent  to  the  transformational  programming.  In  this  paper,  we  take  a  look 
at  how  machine  learning  (ML)  algorithms  can  be  used  to  build  tools  for  software 
development  and  maintenance  tasks.  The  rest  of  the  paper  is  organized  as  follows.  Section 
2  provides  an  overview  of  machine  learning  and  frequently  used  learning  algorithms. 
Some  of  the  software  development  and  maintenance  tasks  for  which  learning  algorithms 
are  applicable  are  given  in  Section  3.  Formulations  of  those  tasks  in  terms  of  the  learning 


algorithms  are  discussed  in  Section  4.  Section  5  describes  some  of  the  existing  work. 
Finally  in  Section  6,  we  conclude  the  paper  with  remarks  on  future  work. 

2.  Machine  Learning  Algorithms 

Machine  learning  deals  with  the  issue  of  how  to  build  computer  programs  that  improve 
their  performance  at  some  task  through  experience  [11].  Machine  learning  algorithms  have 
been  utilized  in:  (1)  data  mining  problems  where  large  databases  may  contain  valuable 
implicit  regularities  that  can  be  discovered  automatically;  (2)  poorly  understood  domains 
where  humans  might  not  have  the  knowledge  needed  to  develop  effective  algorithms;  and 
(3)  domains  where  programs  must  dynamically  adapt  to  changing  conditions  [11], 
Learning  a  target  function  from  training  data  involves  many  issues  (function 
representation,  how  and  when  to  generate  the  function,  with  what  given  input,  how  to 
evaluate  the  performance  of  generated  function,  and  so  forth).  Figure  1  describes  the 
dimensions  of  the  target  function  learning. 

Major  types  of  learning  include:  concept  learning  (CL),  decision  trees  (DT),  artificial 
neural  networks  (ANN),  Bayesian  belief  networks  (BBN),  reinforcement  learning  (RL), 
genetic  algorithms  (GA)  and  genetic  programming  (GP),  instance-based  learning  (IBl/, 
inductive  logic  programming  (ILP),  and  analytical  learning  (AL).  Table  1  summarizes  the 
main  properties  of  different  types  of  learning 

Not  surprisingly,  machine  learning  methods  can  be  (and  some  have  already  been)  used  in 
developing  better  tools  or  software  products.  Our  preliminary  study  identifies  the  software 
development  and  maintenance  tasks  in  the  following  areas  to  be  appropriate  for  machine 
learning  applications:  requirement  engineering  (knowledge  elicitation,  prototyping); 
software  reuse  (application  generators);  testing  and  validation;  maintenance  (software 
understanding);  project  management  (cost,  effort,  or  defect  prediction  or  estimation). 

3.  Software  Engineering  Tasks 

Table  2  contains  a  list  of  software  engineering  tasks  for  which  ML  methods  are  applicable. 
Those  tasks  belong  to  different  life-cycle  processes  of  requirement  specification,  design, 
implementation,  testing  and  maintenance.  This  list  is  by  no  means  a  complete  one.  It  only 
serves  as  a  harbinger  of  what  may  become  a  fertile  ground  for  some  exciting  research  on 
applying  ML  techniques  in  software  development  and  maintenance. 

One  of  the  attractive  aspects  of  ML  techniques  is  the  fact  that  they  offer  an  invaluable 
complement  to  the  existing  repertoire  of  tools  so  as  to  make  it  easier  to  rise  to  the 
challenge  of  the  aforementioned  demanding  situations. 

4.  Applying  ML  Algorithms  to  SE  Tasks 

In  this  section,  we  formulate  the  identified  software  development  and  maintenance  tasks  as 
learning  problems  and  approach  the  tasks  using  machine  learning  algorithms. 

Component  reuse 

Component  retrieval  from  a  software  repository  is  an  important  issue  in  supporting 
software  reuse.  This  task  can  be  formulated  into  an  instance-based  learning  problem  as 
follows: 
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Figure  1.  Dimensions  of  learning. 


Table  1 .  Major  types  of  learning  methods 1 . 
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ILP 
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D  (global) 
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general-to- 

specific 
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RL 
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unsupervised, 
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Through 

training 

episodes 

Actions  with 
max.  Q  value 

Q,  TD 

1  The  classification  here  is  based  on  materials  in  [1 1], 

The  sets  D  and  B  refer  to  training  data  and  domain  theory,  respectively. 

The  algorithms  listed  are  only  representatives  from  different  types  of  learning. 


Table  2.  SE  tasks  and  applicable  ML  methods. 


JJ  SE  tasks 

Applicable  type(s)  of  learning 

Requirement  engineering 

AL,  BBN,  LL,  DT,  ILP 

J  Rapid  prototyping 

GP 

|  Component  reuse 

IBL  (CBR4) 

|  Cost/effort  prediction 

IBL  (CBR),  DT,  BBN,  ANN 

Defect  prediction 

BBN 

Test  oracle  generation 

AL  (EBL5) 

Test  data  adequacy 

CL 

Validation 

Nl _ 

Reverse  engineering 

CL 

L  Components  in  a  software  repository  are  represented  as  points  in  the  n-dimensional 
Euclidean  space  (or  cases  in  a  case  base). 

2.  Information  in  a  component  can  be  divided  into  indexed  and  unindexed  information 
(attributes).  Indexed  information  is  used  for  retrieval  purpose  and  unindexed 
information  is  used  for  contextual  purpose.  Because  of  the  curse  of  dimensionality 
problem  [  1 1  ],  the  choice  of  indexed  attributes  must  be  judicious. 

3.  Queries  to  the  repository  for  desirable  components  can  be  represented  as  constraints  on 
indexable  attributes. 

4.  Similarity  measures  for  the  nearest  neighbors  of  the  desirable  component  can  be  based 
on  the  standard  Euclidean  distance,  distance-weighted  measure,  or  symbolic  measure 

5.  The  possible  retrieval  methods  include:  K-Nearest  Neighbor,  inductive  retrieval 
Locally  Weighted  Regression. 

6.  The  adaptation  of  the  retrieved  component  for  the  task  at  hand  can  be  structural 
(applying  adaptation  rules  directly  to  the  retrieved  component),  or  derivational 
(reusing  adaptation  rules  that  generated  the  original  solution  to  produce  a  new 
solution). 

Rapid  prototyping 

Rapid  prototyping  is  an  important  tool  for  understanding  and  validating  software 
requirements.  In  addition,  software  prototypes  can  be  used  for  other  purposes  such  as  user 
training  and  system  testing  [18].  Different  prototyping  techniques  have  been  developed  for 
evolutionary  and  throw-away  prototypings.  The  existing  techniques  can  be  augmented  by 
including  a  machine  learning  approach,  i.e.,  the  use  of  genetic  programming. 

In  GP,  a  computer  program  is  often  represented  as  a  program  tree  where  the  internal  nodes 
correspond  to  a  set  of  functions  used  in  the  program  and  the  external  nodes  (terminals) 
indicate  variables  and  constants  used  as  input  to  functions.  For  a  given  problem,  GP  starts 
with  an  initial  population  of  randomly  generated  computer  programs.  The’ evolution 
process  of  generating  a  final  computer  program  that  solves  the  given  problem  hinges  on 
some  sort  of  fitness  evaluation  and  probabilistically  reproducing  the  next  generation  of  the 


4  CBR  stands  for  case-based  reasoning. 
EBL  refers  to  explanation-based  learning. 


program  population  through  some  genetic  operations.  Given  a  GP  development 
environment  such  as  the  one  in  [8],  the  framework  of  a  GP-based  rapid  prototyping 
process  can  be  described  as  follows: 

1 .  Define  sets  of  functions  and  terminals  to  be  used  in  the  developed  (prototype)  systems. 

2.  Define  a  fitness  function  to  be  used  in  evaluating  the  worthiness  of  a  generated 
program.  Test  data  (input  values  and  expected  output)  may  be  needed  in  assisting  the 
evaluation. 

3.  Generate  the  initial  program  population. 

4.  Determine  selection  strategies  for  programs  in  the  current  generation  to  be  included  in 
the  next  generation  population. 

5.  Decide  how  the  genetic  operations  ( crossover  and  mutation)  are  carried  out  during 
each  generation  and  how  often  these  operations  are  performed. 

6.  Specify  the  terminating  criteria  for  the  evolution  process  and  the  way  of  checking  for 
termination. 

7 .  Translate  the  returned  program  into  a  desired  programming  language  format. 

Requirement  engineering 

Requirement  engineering  refers  to  the  process  of  establishing  the  services  a  system  should 
provide  and  the  constraints  under  which  it  must  operate  [18].  A  requirement  may  be 
functional  or  non-functional.  A  functional  requirement  describes  a  system  service  or 
function,  whereas  a  non-functional  requirement  represents  a  constraint  imposed  on  the 
system.  How  to  obtain  functional  requirements  of  a  system  is  the  focus  here.  The  situation 
in  which  ML  algorithms  will  be  particularly  useful  is  when  there  exist  empirical  data  from 
the  problem  domain  that  describe  how  the  system  should  react  to  certain  inputs.  Under  this 
circumstance,  functional  requirements  can  be  “learned”  from  the  data  through  some 
learning  algorithm. 

1 .  Let  X  and  C  be  the  domain  and  the  co-domain  of  a  system  function/to  be  learned.  The 
data  set  D  is  defined  as:  D  =  [<xh  c*>|  x,  e  X  a  ck  e  C}. 

2.  The  target  functions/to  be  learned  is  such  that  Vx,  €  X  and  Vc*  e  C,fxt)  =  ck . 

3.  The  learning  methods  applicable  here  have  to  be  of  supervised  type.  Depending  on  the 
nature  of  the  data  set  D,  different  learning  algorithms  (in  AL,  BBN,  CL,  DT,  ILP)  can 
be  utilized  to  capture  (learn)  a  system’s  functional  requirements. 

Reverse  engineering 

Legacy  systems  are  old  systems  that  are  critical  to  the  operation  of  an  organization  which 
uses  them  and  that  must  still  be  maintained.  Most  legacy  systems  were  developed  before 
software  engineering  techniques  were  widely  used.  Thus  they  may  be  poorly  structured 
and  their  documentation  may  be  either  out-of-date  or  non-existent.  In  order  to  bring  to 
bear  the  legacy  system  maintenance,  the  first  task  is  to  recover  the  design  or  specification 
of  a  legacy  system  from  its  source  or  executable  code  (hence,  the  term  of  reverse 
engineering,  or  program  comprehension  and  understanding).  Below  we  describe  a 
framework  for  deriving  functional  specification  of  a  legacy  software  system  from  its 
executable  code. 

1  •  Given  the  executable  code  p  and  its  input  data  set  X,  and  output  set  C,  the  training  data 
set  D  is  defined  as :  D  =  { <  xh  p{xt  )>|  x,-  e  X  a  p(xi)  g  C } . 

2.  The  process  of  deriving  the  functional  specification  /  for  p  can  be  described  as  a 
learning  problem  in  which/is  learned  through  some  ML  algorithm  such  that 

Vx,-  e  X  [fxi)  =p(xi)]. 

3.  Many  supervised  learning  methods  can  be  used  here  (e.g.,  CL). 


Validation 


Verification  and  validation  are  important  checking  processes  to  make  sure  that 
imp  emented  software  system  conforms  to  its  specification.  To  check  a  software 
implementation  against  its  specification,  we  assume  the  availability  of  both  a  specification 
^id  an  executable  code.  This  checking  process  can  be  performed  as  an  analytic  learning 


1 .  Let  X  and  C  be  the  domain  and  co-domain  of  the  implementation  (executable  code)  d 
which  is  defined  as:  p:  X  — >  C. 

2.  The  training  set  D  is  defined  as:  D  =  {<x;-,/?(x;)>|  x,e  X  }. 

3.  The  specification  for/?  is  denoted  as  B,  which  corresponds  to  the  domain  theory  in  the 

analytic  learning.  J 

4.  The  validation  checking  is  defined  to  be:  p  is  valid  if 

V<x„ p(xi)>  e  D  [B  a  x,  h-  p(xi)\. 

5.  Explanation-based  learning  algorithms  can  be  utilized  to  carry  out  the  checking 
process. 

Test  oracle  generation 


Functional  testing  involves  executing  a  program  under  test  and  examining  the  output  from 
t  e  program.  An  oracle  is  needed  in  functional  testing  in  order  to  determine  if  the  output 
from  a  program  is  correct.  The  oracle  can  be  a  human  or  a  software  one  [13]  The 
approach  we  propose  here  allows  a  test  oracle  to  be  learned  as  a  function  from  the 

specification  and  a  small  set  of  training  data.  The  learned  test  oracle  can  then  be  used  for 
the  functional  testing  purpose. 


1 .  Let  X  and  C  be  the  domain  and  co-domain  of  the  program  p  to  be  tested  Let  B  be  the 
specification  for  p. 

2.  Define  a  small  training  set  D  as:  D  =  {<xi,/?(x,)>|  x,  e  X’aX’cX  a/?(x,)  e  C}. 

.  Use  the  explanation-based  learning  (EBL)  to  generate  a  test  oracle  0  (0-  X  ->  C)  for  n 
from  B  and  D.  '  F 


4.  Use  0  for  the  functional  testing:  Vx,  e  X  [output  of  p  is  correct  ifp(x,)  =  ©(*,.)]. 

Test  adequacy  criteria 


Software  test  data  adequacy  criteria  are  rules  that  determine  if  a  software  product  has  been 
adequately  tested  [21].  A  test  data  adequacy  criterion  C,  is  a  function:  ^:PxSxT->  {true 
false)  where  P  is  a  set  of  programs,  S  a  set  of  specifications  and  T  the  class  of  test  sets’ 
C,(p,  s  t)  -  true  means  that  t  is  adequate  for  testing  program  p  against  specification  s 
according  to  criterion  £.  Since  £  is  essentially  a  Boolean  function,  we  can  use  a  strategy 
such  as  CL  to  learn  the  test  data  adequacy  criteria.  ~ 


1.  Define  the  instance  space  X  as:  X  -  {  <ph  sj,  4>|/?,  e  P  a  sj  e  S  a  tk  e  T) 

2.  Define  the  training  data  set  D  as:  D  =  {<x,  C(*)>|  *  €  X  a  £(x)  €  V),  where  V  is 
defined  as:  V  =  {true,  false). 

3.  Use  the  concept  of  version  space  and  the  candidate-elimination  algorithm  in  CL  to 
learn  the  definition  of  £. 


Software  defect  prediction 


Software  defect  prediction  is  a  very  useful  and  important  tool  to  gauge  the  likely  delivered 
quality  and  maintenance  effort  before  software  systems  are  deployed  [4],  Predicting 
efects  requires  a  holistic  model  rather  than  a  single-issue  model  that  hinges  on  either  size 
or  complexity,  or  testing  metrics,  or  process  quality  data  alone.  It  is  argued  in  [4]  that  all 
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these  factors  must  be  taken  into  consideration  in  order  for  the  defect  prediction  to  be 
successful. 

Bayesian  Belief  Networks  (BBN)  prove  to  be  a  very  useful  approach  to  the  software 
defect  prediction  problem.  A  BBN  represents  the  joint  probability  distribution  for  a  set  of 
variables.  This  is  accomplished  by  specifying  (a)  a  directed  acyclic  graph  (DAG)  where 
nodes  represent  variables  and  arcs  correspond  to  conditional  independence  assumptions 
(causal  knowledge  about  the  problem  domain),  and  (b)  a  set  of  local  conditional 
probability  tables  (one  for  each  variable)  [7,  11].  A  BBN  can  be  used  to  infer  the 
probability  distribution  for  a  target  variable  (e.g.,  “Defects  Detected”),  which  specifies  the 
probability  that  the  variable  will  take  on  each  of  its  possible  values  (e.g.,  “very  low”, 
“low”,  “medium”,  “high”,  or  “very  high”  for  the  variable  “Defects  Detected”)  given  the 
observed  values  of  the  other  variables.  In  general,  a  BBN  can  be  used  to  compute  the 
probability  distribution  for  any  subset  of  variables  given  the  values  or  distributions  for  any 
subset  of  the  remaining  variables.  When  using  a  BBN  for  a  decision  support  system  such 
as  software  defect  prediction,  the  steps  below  should  be  followed. 

1.  Identify  variables  in  the  BBN.  Variables  can  be:  (a)  hypothesis  variables  for  which  the 
user  would  like  to  find  out  their  probability  distributions  (hypothesis  variable  are  either 
unobservable  or  too  costly  to  observe),  (b)  information  variables  that  can  be  observed, 
or  (c)  mediating  variables  that  are  introduced  for  certain  purpose  (help  reflect 
independence  properties,  facilitate  acquisition  of  conditional  probabilities,  and  so 
forth).  Variables  should  be  defined  to  reflect  the  life-cycle  activities  (specification, 
design,  implementation,  and  testing)  and  capture  the  multi-facet  nature  of  software 
defects  (perspectives  from  size,  testing  metrics  and  process  quality).  Variables  are 
denoted  as  nodes  in  the  DAG. 

2.  Define  the  proper  causal  relationships  among  variables.  These  relationships  also 
should  capture  and  reflect  the  causality  exhibited  in  the  software  life-cycle  processes. 
They  will  be  represented  as  arcs  in  the  corresponding  DAG. 

3.  Acquire  a  probability  distribution  for  each  variable  in  the  BBN.  Theoretically  well- 
founded  probabilities,  or  frequencies,  or  subjective  estimates  can  all  be  used  in  the 
BBN.  The  result  is  a  set  of  conditional  probability  tables  one  for  each  variable.  The 
full  joint  probability  distribution  for  all  the  defect-centric  variables  is  embodied  in  the 
DAG  structure  and  the  set  of  conditional  probability  tables. 

Project  effort  (cost)  prediction 

How  to  estimate  the  cost  for  a  software  project  is  a  very  important  issue  in  the  software 
project  management.  Most  of  the  existing  work  is  based  on  algorithmic  models  of  effort 
[17].  A  viable  alternative  approach  to  the  project  effort  prediction  is  instance-based 
learning.  IBL  yields  very  good  performance  for  situations  where  an  algorithmic  model  for 
the  prediction  is  not  possible.  In  the  framework  of  IBL,  the  prediction  process  can  be 
carried  out  as  follows. 

1.  Introduce  a  set  of  features  or  attributes  (e.g.,  number  of  interfaces,  size  of  functional 
requirements,  development  tools  and  methods,  and  so  forth)  to  characterize  projects. 
The  decision  on  the  number  of  features  has  to  be  judicious,  as  this  may  become  the 
cause  of  the  curse  of  dimensionality  problem  that  will  affect  the  prediction  accuracy. 

2.  Collect  data  on  completed  projects  and  store  them  as  instances  in  the  case  base. 

3.  Define  similarity  or  distance  between  instances  in  the  case  base  according  to  the 
symbolic  representations  of  instances  (e.g..  Euclidean  distance  in  an  n-dimensional 
space  where  n  is  the  number  of  features  used).  To  overcome  the  potential  curse  of 


dimensionality  problem,  features  may  be  weighed  differently  when  calculating  the 
distance  (or  similarity)  between  two  instances. 

4.  Given  a  query  for  predicting  the  effort  of  a  new  project,  use  an  algorithm  such  as  K- 
Nearest  Neighbor,  or.  Locally  Weighted  Regression  to  retrieve  similar  projects  and  use 
them  as  the  basis  for  returning  the  prediction  result. 


5.  Existing  Work 

Several  areas  in  software  development  have  already  witnessed  the  use  of  machine  learning 
methods.  In  this  section,  we  take  a  look  at  some  reported  results.  The  list  is  definitely  not  a 
complete  one.  It  only  serves  as  an  indication  that  people  realize  the  potential  of  ML 
techniques  and  begin  to  reap  the  benefits  from  applying  them  in  software  development  and 

miiinfrnonoo  * 


Scenario-based  requirement  engineering 

The  work  reported  in  [9]  describes  a  formal  method  for  supporting  the  process  of  inferring 
specifications  of  system  goals  and  requirements  inductively  from  interaction  scenarios 
provided  by  stakeholders.  The  method  is  based  on  a  learning  algorithm  that  takes 
scenarios  as  examples  and  counter-examples  (positive  and  negative  scenarios)  and 
generates  goal  specifications  as  temporal  rules. 


A  related  work  in  [6]  presents  a  scenarios-based  elicitation  and  validation  assistant  that 
elps  requirements  engineers  acquire  and  maintain  a  specification  consistent  with 
scenarios  provided.  The  system  relies  on  explanation-based  learning  (EBL)  to  generalize 
scenarios  to  state  and  prove  validation  lemmas. 


Software  project  effort  estimation 


Instance-based  learning  techniques  are  used  in  [17]  for  predicting  the  software  project 
effort  for  new  projects.  The  empirical  results  obtained  (from  nine  different  industrial  data 
sets  totaling  275  projects)  indicate  that  cased-based  reasoning  offers  a  viable  complement 
to  the  existing  prediction  and  estimations  techniques.  A  related  CBR  application  in 
software  effort  estimation  is  given  in  [20]. 


Decision  frees  (DT)  and  artificial  neural  networks  (ANN)  are  used  in  [19]  to  help  predict 
software  development  effort.  The  results  were  competitive  with  conventional  methods 
such  as  COCOMO  and  function  points.  The  main  advantage  of  DT  and  ANN  based 
estimation  systems  is  that  they  are  adaptable  and  nonparametric. 

The  result  reported  in  [3]  indicates  that  the  improved  predictive  performance  can  be 
obtained  through  the  use  of  Bayesian  analysis.  Additional  research  on  ML  based  software 
effort  estimation  can  be  found  in  [2,14,15,16], 

Software  defect  prediction 


Bayesian  belief  networks  are  used  in  [4]  to  predict  software  defects.  Though  the  system 
reported  is  only  a  prototype,  it  shows  the  potential  BBN  has  in  incorporating  multiple 
perspectives  on  defect  prediction  into  a  single,  unified  model. 


Variables  m  the  prototype  BBN  system  [4]  are  chosen  to  represent  the  life-cycle  processes 
ofspeafication,  design  and  implementation,  and  testing  (Problem-Complexity,  Design- 
Effort,  Design-Size,  Defects-Introduced,  Testing-Effort,  Defects-Detected,  Defects- 
Density-At-Testing,  Residual-Defect-Count,  and  Residual-Defect-Density).  The  proper 
causal  relationships  among  those  software  life-cycle  processes  are  then  captured  and 
reflected  as  arcs  connecting  the  variables. 


A  tool  is  then  used  with  regard  to  the  BBN  model  in  the  following  manner.  For  given  facts 
about  Design-Effort  and  Design-Size  as  input,  the  tool  will  use  Bayesian  inference  to 
derive  the  probability  distributions  for  Defects-Introduced,  Defects-Detected  and  Defect- 
Density. 


6.  Concluding  Remarks 

In  this  paper,  we  show  how  ML  algorithms  can  be  used  in  tackling  software  engineering 
problems.  ML  algorithms  not  only  can  be  used  to  build  tools  for  software  development 
and  maintenance  tasks,  but  also  can  be  incorporated  into  software  products  to  make  them 
adaptive  and  self-configuring.  A  maturing  software  engineering  discipline  will  definitely 
be  able  to  benefit  from  the  utility  of  ML  techniques. 

What  lies  ahead  is  the  issue  of  realizing  the  promise  and  potential  ML  techniques  have  to 
offer  in  the  circumstances  as  discussed  in  Section  4.  In  addition,  expanding  the  frontier  of 
ML  application  in  software  engineering  is  another  direction  worth  pursuing. 
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